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THE POPULATION FREQUENCIES OF SPECIES AND THE 
ESTIMATION OF POPULATION PARAMETERS 


By I. J. GOOD 


A random sample is drawn from a population of animals of various species. (The theory may also be 
applied to studies of literary vocabulary, for example.) If a particular species is represented r times 
in the sample of size N, then r/N is not a good estimate of the population frequency, p, when r is small. 
Methods are given for estimating p, assuming virtually nothing about the underlying population. 
The estimates are expressed in terms of smoothed values of the numbers n, (r= 1, 2, 3, ...), where n, 
is the number of distinct species that are each represented r times in the sample. (n, may be described 
as ‘the frequency of the frequency r’.) Turing is acknowledged for the most interesting formula in this 
part of the work. An estimate of the proportion of the population represented by the species occurring 
in the sample is an immediate corollary. Estimates are made of measures of heterogeneity of the 
population, including Yule’s ‘characteristic’ and Shannon’s‘entropy’. Methods are then discussed that 
do depend on assumptions about the underlying population. It is here that most work has been done 
by other writers. It is pointed out that a hypothesis can give a good fit to the numbers n, but can give 
quite the wrong value for Yule’s characteristic. An example of this is Fisher’s fit to some data of 
Williams’s on Macrolepidoptera. 


1. Introduction. We imagine a random sample to be drawn from an infinite population of 
animals of various species. Let the sample size be N and let n, distinct species be each 
represented exactly r times in the sample, so that 


ym, = XN. (1) 
r=1 
The sample tells us the values of n,, no, ..., but not of m9. In fact it is not quite essential that 
Np» Should be finite though we shall find it convenient to suppose that it is. 

We shall suggest a method of estimating, among other things, 

(i) the population frequency of each species; 

(ii) the total population frequency of all species represented in the sample, or, as we may 
say, ‘the proportion of the population represented by (the species occurring in) the sample’; 

(iii) various general population parameters measuring heterogeneity, including ‘entropy’. 
By ‘general’ parameters we mean parameters defined without reference to any special form 
of hypothesis. In §7 we shall consider the estimation of parameters for hypotheses of special 
forms. | 

Our results are applicable, for example, to studies of literary vocabulary, of accident 
proneness and of chess openings, but for definiteness we formulate the theory in terms of 
species of animals. 

The formula (2) was first suggested to me, together with an intuitive demonstration, by 
Dr A. M. Turing several years ago. Hence a very large part of the credit for the present 
paper should be given to him, and I am most grateful to him for allowing me to publish this 
work. 

Reasonably precise conditions under which our general results are applicable will be given 
in §4, but we state at once that the larger is n, the more applicable the results. When 7, is 
large, nq will also be large, but we shall not for the most part attempt to estimate it. There 
will be a fleeting reference to the estimation of n, at the end of §5 and a few more references 
in §§7 and 8. (See, for example, equation (73).) For populations of known finite size, the 
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problem has been considered by Goodman (1949). He proved that if the sample size is not 
less than the maximum number of individuals in the population belonging to a single 
species, then there is only one unbiased estimate of n, and he found it. He also pointed out 
that the unbiased estimate is liable to be unreasonable and suggested some alternative 
estimates that are always reasonable. There is practically no overlapping between the 
present work and that of Goodman. 

Jeffreys (1948, §3-23) has discussed what is superficially the same problem as (i) above, 
under the heading ‘multiple sampling’. He refers to some earlier work of Johnson (1932). 
The methods of Johnson and Jeffreys depend on assumptions that, as Jeffreys himself 
points out, are not always acceptable. Moreover, their methods are not intended to be 
applicable when n, is unknown. The matter is taken up again in §2. 

Other work on the frequencies of species has been mainly concerned with the fitting of 
particular distributions to the data, with or without a theoretical explanation of why these 
distributions might be expected to be suitable. See, for example, Anscombe (1950), Chambers 
& Yule (1942), Corbet, Fisher & Williams (1943), Greenwood & Yule (1920), Newbold (1927), 
Preston (1948), Yule (1944) and Zipf (1932). The methods of the first six sections of the 
present paper are largely independent of the distributions of population frequencies. 

We shall be largely concerned with qg,, the population frequency of an arbitrary species 
that is represented r times in the sample. We shall use the notation &(q,) for the expected 
value of q,, in a sense to be explained in §2. Our main result, expressed rather loosely, is 
that the expected value of g, is r*/N, where 


r¥x (r+ 1) n,44/n,. (2) 


(The symbol ‘=’ is used throughout to mean ‘is approximately equal to’.) More precisely 
the n,’s should first be smoothed before applying formula (2). Smoothing is briefly discussed 
in §3 with examples in §8. If the smoothed values are denoted by 7}, 3,3, ..., then the 
more accurate form of equation (2) is 


r* = (r+1)n2,,/n,. (2’) 


The reader will find it instructive to consider the special case when n; is of the Poisson 
form se~*a’/r! Then r* reduces to a constant. 
The formula (2) can be generalized to give higher moments of g,. In fact 


(r wi m)™ Ny+m 
N" 4%, 


E(q")= (r = 1, 2,3, ...; m = 0,1, 2, ...), (3) 


where {&™ = ¢(t—1)...(t—m +1). We can also write (3) in the form 
EGF) =E (Gp) E(Yp41) «++ O(Gp+m—t)- (4) 


Moreover, the variance of q, is 


v( pa Ey Rasa (EEE Masa) 
‘ N? n, N n 


=&(9,) [6 (4,41) — &(9,)]- (5) 


An immediate deduction from (2) is that the expected total chance of all species that are each 
represented r times (r > 1) in the sample is approximately 





Tr 


(r+ 1) n,4,/N. (6) 
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Hence also the expected total chance of all species that are represented r times or more in 
the sample is approximately 


N-M(r +1) 1,44 + (7 +2) M494 ---}- (7) 


In particular, the expected total chance of all species represented at all in the sample is 
approximately 


N-(2n.+3n 3+...) = 1—n,/N. (8) 
We may say that the proportion of the population represented by the sample is approxi- 
mately 1—n,/N, and the chance that the next animal sampled will belong to a new species is 
approximately 

(Thus (6) is true even if r = 0.) 


The results (6), (7), (8) and (9) are improved in accuracy by writing the respective 
formulae as 


n,/N. (9) 








(r+1) 7,417, F 

(r+1) 7,417, 6 
atat, (6’) 
i ee 2) MrsaMrss +d, (7’) 

r N41 
w=", Sam... (8’) 
Ny Ne 

Ns Np 9’ 
and ny N° si 


In most applications this last expression will be extremely close to n;/N, and this in its turn 
will often be very close to n,/N. It follows that (8’) and (9’) are practically the same as (8) 
and (9). For the sake of mathematical consistency, the smoothing should be such that (8’) 
and (9’) add up to 1. 

An index of notations used in a fixed sense is given in §9. 

I am grateful, and my readers will also be grateful, to Prof. M. G. Kendall for forcing me 
to clarify some obscurities, especially in §§1 and 2. 


2. Proofs. Let the number of species in the population be s, which we suppose is finite. 
This is the same supposition as that n, is finite. Our results as far as §6 would be practically 
unchanged if s were enumerably infinite, but the proofs are more rigorous when it is finite. 
Let the population frequencies of the species be, in some order, p,, p2, ..., P,, Where 


Pit Pet --- t+Ps = 1, Ng tN +... = 8. 


Let H, or more explicitly H(p,,p2,...,p,), be the statistical hypothesis asserting that 
P1; Pa ---»P, are the population frequencies. We shall discuss the expectation of n,, given H. 
It may be objected that the expectation of n, is simply the observed number n,, whatever 
the information, and this objection would be logically correct. Strictly we should introduce 
extra notation, say v, y, for the random variable that is the frequency of the frequency r in 
a random sample of size N. Then we could introduce the notation &(v, y | H) for the expecta- 
tion of v, y given H. (Logically this expectation would remain unaffected if particular 
values of n,,7%9,73,... were given.) In order to avoid the extra notation v, y we shall 
write &(n,) or &(n,| H) or &y(n,| A) instead of &(v, y | H). Confusion can be avoided by 
reading &,(n, | H) as ‘the expectation of the frequency of the frequency r when H is given 
16-2 
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and when the sample size is N’. Similarly, we write V(n,) = V(n,|H) = Vy(n,| H) for the 
variance of v, y given H and &y(n?| H), ete., for &(v, 3,| H). 

We recall the theorem that an expectation of a sum is the sum of the expectations. It 
follows that &y(n, | H) is the sum over all s species of the probabilities that each will occur 


r times, given H. So Ey(n,| H) = &(n,| H) = &(n,) 
5 4, (7) Pp — PAY. si 
In particular & y(n | H) = ¥ (1 —p,)%. (11) 
io 


If s were infinite this series would diverge. The divergence would be appropriate since ny 
would also be infinite. 

Now suppose that in a sample of size N a particular species occurs r times (r = 0, 1, 2, ...). 
We shall consider the final (posterior) probability that this species is the wth one (of popula- 
tion frequency p,). For the sake of rigour it is necessary to define more precisely how the 
species is selected for consideration. We shall suppose that it is sampled ‘at random’, or 
rather equiprobably, from the s species, and that then its number of occurrences in the 
sample is counted. Thus the initial (prior) probability that the species is the wth one is 1/s. 
If the species is the wth one then the likelihood that the observed number of occurrences 


is ris 
N 
( r ) yume —p,)"~. 


We write q, for the (unknown) population frequency of an arbitrary species that is represented 
r times in the sample. The final probability that the species is the wth one can be written as 
P(q, = p,| H) provided that the p,’s are unequal. (If any of the p,’s are equal they can be 
adjusted microscopically so as to be made unequal. These adjustments will have no practical 
effect.) We may at once deduce the final probability that the species is the wth one by using 
Bayes’s theorem in the form thai the final probabilities are proportional to the initial ones 
times the likelihoods. We find that 


N-r 
P (q, = p,| H) = et Po) (12) 
Pre _p,)" 





It follows that for any positive integer m, 


> ge (1- 7" 
(qr | H) ==; (13) 
pal p,)"~* 
(r + my™ Ey 4m(% 4m | H) (14) 
~ (N+my™ En(n, | H) : 
in view of (10) and of (10) with N replaced by N +m. Immediate consequences of (14) are 
the basic result +1 Syatltegs | H) 











6D) = We aH = O12) (15) 
Us,2, nPr,0,N — Mee 
and V(q,| H) = — Pen (16) 
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i ! 
where Paw = Tape Evel | (a7) 
1 8 ! 
= ay Dt —p,)" = + Sy(n, | H) Eg |) (18) 
(NV r)! p=1 N! 


by (10) and (14). It is clear from either form of (18) that the numbers yu; , y (tf = 0, 1, 2, ...) 
form a sequence of moment constants and therefore satisfy Liapounoff’s inequality. (See, 
for example, Good (1950a), or Uspensky (1937).) This checks that the right side of (16) is 
positive, as it should be being a variance. [It is obvious incidentally that (16) would be 
true with y;, y defined as &(q} | H) times any expression independent of t.] 

We can now approximate the formulae (14) and (15) by replacing &y,,,(”,1m | H) by the 
observed value, 7,,,,, in the sample of size NV, or rather by the smoothed value n;,,,. If 
m is very small compared with N, if n, and n,,,, are not too small and if the sequence 
N,N, 3, ... is smoothed in the neighbourhood of n, and n,,,,, then we may expect the 
approximations to be good. We thus obtain all the approximate results of §1. Note that 
when the approximation is made of replacing @y.m(%m|H) by 7}. we naturally also 
change the notation &(g”| H) to &(g™). For the results become roughly independent of 
H unless the n,’s are too small to smooth. Observe that &(q”| H) does not depend on the 
sample, unless H is itself determined by using the sample. On the other hand, &(q7") does 
depend on the sample. This may seem a little paradoxical and the following explanation 
is perhaps worth giving. When we select a particular sequence of smoothed values 
N}, Ng, Nz, ... We are virtually accepting a particular hypothesis H, say H{N; nj, n3, 3, ..-}, 
with curly brackets. (I do not think that this hypothesis is usually a simple statistical 
hypothesis.) Then &(g™) can be regarded as a shorthand for &(q”"| H{N; nj, 3, 3, ...}). (If 
H{...} is not a simple statistical hypothesis this last expression could in theory be given 
a definite value by assuming a definite distribution of probabilities of the simple statistical 
hypotheses of which H is a disjunction.) When we regard the smoothing as reasonably 
reliable we are virtually taking H{N; nj, 3,73, ...} for granted, as an approximation, so that 
it can be omitted from the notation without serious risk of confusion. In order to remind 
ourselves that there is a logical question that is obscured by the notation, we may describe 
&(q%") as say a ‘credential expectation’. 

If a specific H is accepted it is clearly not necessary to use the approximations since 
equation (13) can then be used directly. Similarly, if H is assumed to be selected from 
a superpopulation, with an assigned probability density, then again it is theoretically 
possible to dispense with the approximations. In fact if the ‘point’ (p,, po, ..., p,) is assumed 
to be selected from the ‘simplex’ p, + p.+...+p, = 1, with probability density proportional 
to (p12... p,)*-1, where & is a constant, then it is possible to deduce Johnson’s estimate 
q, = (r+k)/(N+ks). Jeffreys’s estimate is the special case k = 1, when the probability 
density is uniform. Jeffreys suggests conditions for the applicability of his estimate, but 
these conditions are not valid for our problem in general. This is clear if only because we do 
not assume s to be known. 

Jeffreys assumes explicitly that all ordered partitions of N into s non-negative parts are 
initially equally probable, while Johnson assumes that the probability that the next 
individual sampled will be of a particular species depends only on N and on the number 
of times that that species is already represented in the sample. Clearly both methods ignore 
any information that can be obtained from the entire set of frequencies of all species. 
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The ignored information is considerable when it is reasonable to smooth the frequencies 
of the frequencies. 


3. Smoothing. The purpose of smoothing the sequence 7,2, ”3,... and replacing it by 
@ New sequence 71, 3, 3, ..., is to be able to make sensible use of the exact results (14) and 
(15). Ignoring the discrepancy between &y and &y,,,,, the best value of n; would be &y(n, | H), 
where H is true. One method of smoothing would be to assume that H = H(p,, po, ..., D,) 
belongs to some particular set of possible H’s, to determine one of these, say Hy, by maximum 
likelihood and then to calculate nj as &y(n,|H,). This method is closely related to that of 
Fisher in Corbet et al. (1943). Since one of our aims is to suggest methods which are virtually 
distribution-free, it would be most satisfactory to carry out the above method using all 
possible H’s as the set from which to determine H,. Unfortunately, this theoretically 
satisfying method leads to a mathematical problem that I have not solved. 

It is worth noticing that the sequence {@y(n, | H)}(r = 0,1,2,...) has some properties 
invariant with respect to changes in H. Ideally the sequence {n;} should be forced to have 
these invariant properties. In particular the sequence {y;, y}(¢ = 0,1,2,...), defined by 
(17), is a sequence of moment constants. But if ¢ = o(,/N), then N~(r +t)! n,.=4;4n, 80 
that if t = o(,/N) we can assume that the sequence r! n; is a sequence of moment constants 
and satisfies Liapounoff’s inequalities. But this simply implies that 0*, 1*, 2*, ...,¢* forms 
an increasing sequence (see equation (2’)), a result which is intuitively obvious even without 
the restriction t = o(,/N). (Indeed, the argument could be reversed in order to obtain a new 
proof of Liapounoff’s inequality.) We also intuitively require that 0*, 1*, 2*... should itself 
be a ‘smooth’ sequence. 

Since the sequence {y; , vy} (t = 0,1, 2, ...) is a sequence of moment constants of a prob- 
ability distribution it follows from Hardy (1949, §11-8) that the sequence is ‘totally 
increasing’, i.e. that all its finite differences are non-negative. This result is unfortunately 
too weak to be useful for our purposes, but it may be possible to make use of some other 
theorems concerning moment constants. This line of research will not be pursued in the 
present paper. 

A natural principle to adopt when smoothing is that 


. a (ny —™)? 
x 2 V(n,| H) 





(19) 


should not be significant with r degrees of freedom. In §5 we shall obtain an approximate 
formula for V(n,|H), applicable when r* = o(N). The chi-squared test will therefore be 
applicable when r? = o(NV). [See formulae (22), (25), (26) and, for particular H’s, (65), (85), 
(86).] 

Another similar principle can be understood by thinking of the histogram of n, as several 
piles of pennies, n, pennies in the rth pile. We may visualize the smoothing as the moving of 
pennies from pile to pile, and we may reasonably insist that pennies moved to the rth pile 
should not have been moved much further horizontally than a distance ,/r and almost never 
further than 2,/r. For r = 0 we would not insist on this rule, i.e. we do not insist that 

co) ao 

Dy nm, = > n,. The analogy with piles of pennies amounts to saying that a species that 
r= r= 
‘should’ have occurred r times is unlikely to have occurred less than r—./r or more than 
r+./r times, 





ee ee ee ee ee en. ee! ee ee ee ee.) 
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Let N’ = Xrn;. It seems unnecessary to insist on VN’ = N, provided that N is replaced by 
N’ in such formulae as &(q,)=r*/N. It will be convenient, however, in §6 to assume N’ = N. 

For some applications very little smoothing will be required, while for others it may be 
necessary to use quite elaborate methods. For example, we could 

(i) Smooth the n,’s for the range of values of r that interests us, holding in mind the 
above chi-squared test and the rule concerning ,/r. The smoothing techniques may include 
the use of freehand curves. Rather than working directly with n,, 2, 3, ... it may be found 
more suitable to work with the cumulative sums 71,7, +7 ,,+%.+MN3,... or with the 
cumulative sums of the rn, or with the logarithms log 7,, log n., log n3, .... There is much to 
be said for working with the numbers ,/7j, ./N9, /Nz, .... For if we assume that V(n,| H) is 
approximately equal to n, (and in view of (26) and (27) of §3 this approximation is not on 
the whole too bad), then it would follow that the standard deviation of ,/n, is of the order 
of } and therefore largely independent of r. Hence graphical and other smoothing methods 
can be carried out without having constantly to hold in mind that | n;—n, | can reasonably 
take much larger values when , is large than when it is small. [The square-root transforma- 
tion for a Poisson variable, x, was suggested by Bartlett (1936) in order to facilitate the 
analysis of variance. He showed also that the transformation ,/(x + $) leads to an even more 
constant variance. Anscombe (1948) proved that ,/(7+ 3) has the most nearly constant 
variance of any variable of the form ,/(z+c), namely, }, when the mean of z is large. He 
attributes this result to A. H. L. Johnson.] 

(ii) Calculate (r + 1) m;,,/n;. 

(iii) Smooth these values getting, say, r*. 

(iv) Possibly use the values of r* to improve the smoothing of the n,’s. If this makes 
a serious difference it will be necessary to check again that the chi-squared test and the ,/r 
rule have not been violated. 

(v) Light can be shed on the reliability of the estimates of the q,’s, etc., if the data are 
smoothed two or three times, possibly by different people. 

In short, the estimation of the q,’s should be done in such a way as to be consistent with 
the axioms of probability and also with any intuitive judgements that the users of the 
method are not prepared to abandon or to modify. (This recommendation applies to much 
more general theoretical scientific work, though there are rare occasions when it may be 
preferred to abandon the axioms of a science.) 

An objection could be raised to the methods of smoothing suggested in the present 
section. It could be argued that all smoothing methods indirectly assume something about 
the distribution p,, and that one might just as well apply the method of Greenwood & Yule 
(1920) and its modification by Corbet et al. (1943) of assuming a distribution of Pearson’s 
Type III, Ap*e~4?, or of some other form. Our reply would be that smoothing can be done 
by making only local assumptions, for example, that the square root of &(n,|H), as a 
function of r, is approximately ‘p«rabolic’ for any nine consecutive values of r. Moreover, 
it may often be more convenient t apply the general methods of the present section than 
to attempt to find an adequate hypothesis, H. 


4. Conditions for the applicability of the results of §§1 and 2. The condition for the applic- 
ability of the results of §§ 1 and 2 is that the user of the methods should be satisfied with his 
approximations to &y,,,(7,;m |H) corresponding to the values of r and m used in the 
application. This condition is clearly correct, since equation (14) is exact. In particular, if 
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n, is large enough the user would be quite happy to deduce (9) from (15) withy = 0. Similarly, 
he will be satisfied with the estimates of say q,,q,. and g, provided he is satisfied with the 
smoothed values (7}, 3, %3, 24) Of 1, Ng, Nz and mq. 


5. The variance of n,. For the application of the chi-squared test described in §3 we need 
to know more about V(n,). We begin by obtaining an exact formula for V(n, | H) = Vy(n,| H) 
and we then make approximations that justify the omission of the symbol H from the 
notation. It is convenient to introduce the random variable x, , = x, that is defined as 
1 if the ‘th species’ (of population frequency p,) occurs precisely r times in a sample of size 


N (H being given), otherwise x, = 0. Clearly P (x, = 1| H) = (*) P(l—p,)*. Now 
&(n} | H) = &(Lx,)? 
r 


= 2X &(x,2,) 
or BY 
= DE(x,)+ & F(x,2,) 
# bY 
= &(0,|E)+5>5 sn © ~p,)*-™, 20 
(n, | )+ rir! (N-— 2r)! 5 = PPL —Pyz P,) ( ) 


This is exact. We now make some approximations of the sort used in deriving the Poisson 
distribution from the binomial. We get, assuming r?/N, rp, and rp, to be small, 


(Np,)’ e-’P» 
(*) PAL —p,)\"= = ay say, 


r! 
M! 


peeairer=ty se ES? ‘ang N-2r ~ 
and rir! (N— Dp) PaPoll —Pu P,) a,Q,. 


Moreover, it is intuitively clear that terms for which p, or p, is far from r/N can make no 
serious contribution to the summation in (20). Hence, if r? = o(N), 


&(n? | H)=@(n,| H)+ = a,4, 


=&(n,| H) sw; | H)}?-d a3. 
B 
Therefore the variance of n, for samples of size N is 
(Np )” e~2Npye 
(r!)? 


= Ey(t| H)~ 5) Save | (22) 


Vy(n, | H)=Ey(n n, | H)— 2 (21) 


Formulae (21) and (22) are elegant but need further transformation, when H is unknown, 
before they can be used for calculation. Notice first that there are n,, species whose expected 





population frequencies are g, (uv = 0,1, 2,...). Hence we have for r = 0, 1, 2,...; r2 = o(N), 
e-Nau 
V(n,| H)~&(n, | H)— Pe (nr 
*r p—u*\2 
= &(n,| H)— ral : (23) 
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Similarly and rather more simply, when r? = o(J’), 
&(n, | = Daur (24) 
Now for any positive x, x” e-* <r" e", so 
&(n,| H)> V(n,| H)>&(n, | H) (: ~ “) : (25) 
Using Stirling’s formula for r—1 we have 


&(n,| H)S V(n,| H)F&(n, | H) (1 ~Tm) (r = 2,3,...), (26) 
while &(n, | H)> V(n,| H)> &(n, | H) (1—e“) (27) 


(see also formula (65) in §7). Now the most desirable value for n; would be &(n, | H) where 
H is true, so if our smoothing of the n,’s is to be satisfactory for any particular values of 
r small compared with ./N we may write 
-) u*t e—u* 
m= Y my ——— (28) 
u=0 rT 

and these approximate equations may be used as a test of consistency for the values of 
nm, and u*. Indeed, it may be possible iteratively to solve equations (28) combined with 
(2’) and thus very systematically to obtain estimates of n/ and r* for values of r small 
compared with ,/N. This iterative process may possibly lead to estimates of n, and 0*, but 
I have not yet tried out the process. For most applications the less systematic methods 
previously described will probably prove to be adequate, and any smoothing obtained by 
these methods can be partially tested by means of x? in the form (19), together with the 
inequalities (26) and (27). (See also the remarks following equations (65) and (87).) 


6. Estimation of some population parameters, including entropy. Let us consider the 
population parameters 


8 
Cnn = 2 Pi —logp,)” (m,n = 0,1, 2,...), (29) 
f= 


which can be regarded as measures of heterogeneity of the population. The sequence 
Co,0 = 1, €1,9 = 8, Ce 9, Cg,9, --- may be called the ‘moment constants’ of the population, while 
C, , is called the ‘entropy’ in the modern theory of communication (see Shannon, 1948). 
More generally, c, ,, is the moment about zero of the amount of information from each 
selection of an animal (or word), where ‘amount of information’ is here used in the sense of 
Good (19505, p. 75), i.e. as minus the logarithm of a probability. (The last sentence of p. 75 
of this reference is incorrect, as Prof. M. S. Bartlett has pointed out.) We find it no more 
difficult to give estimates of c,, ,, than of c, ,, at any rate when n = 0 or 1. 
It is an immediate consequence of (10) that an unbiased estimate of c,, 9 is 


1 
Emn,0 = No ~ rmn,. (30) 
é, 9 is in effect used by Yule (1944) to measure the heterogeneity of samples of vocabulary, 
and he calls 10,000é, 9(1—1/N) the ‘characteristic’ of the material. The sequence of all 
sampling moments of é, , involves all the population parameters c,,». For example, as 


pointed out by Simpson (1949), for large NV, 


Vib0)=~ (Cyo—ha)- (30) 








246 The population frequencies of species 


Unbiased statistics are rather unfashionable nowadays, partly because they can take 
impossible values. For example, é,, ) could vanish, although it is easy to see thatc,, 9 >s—™"—. 
(Compare Good (1950, p. 103), where estimates of c,, . are implicit for general multinomial 
distributions, no attempt being made to smooth the n,’s.) We shall find estimates of c,, , and 
also estimates of c,, 9 that are at least sometimes better than é,, ». 


We have 
Cng = ae 'E(n, | H), (31) 


since this is in effect what is meant by saying that é,, y is an unbiased estimate of c,, . If the 
statistician is satisfied with his smoothing, i.e. if he assumes that nj=&(n, | H), and if he has 
forced N’ = N, then he can estimate c,, 9 as 


- 1 , 
Cm, 0 = Fon) rns, (32) 


and he will be prepared to assume that this is a more efficient estimate than é,, 5. More 
generally if the smoothing is satisfactory for r = 1, 2,...,¢ but not for all larger values of 
r, then a good estimate of c,, 9 will be Z,, 9(¢), where 


En, o(t) = yo = Tr n+ rom]. (33) 


r=t+1 


We shall next consider estimates Z,,, , of c,,,;. We shall begin by proving that (exactly) 


1 1 
ems = BEIM Elm, | Ht gt tH 
— 4 jog 6(n,| H)—EUlog(1—g,)| HI}. (34) 


The differential coefficient in this expression is made meaningful by means of a suitable 
definition of &(n, | H) for non-integral values of r. This definition is obtained from equation 


(10) by writing ['(N + 1)/I'(r+ 1) (NV —r+ 1) instead of (") ; 


In order to prove (34) we shall need the following generalization of (13), valid for any 
function f(.): 


N 
&(n,| H)81f(a,) | H) = (,) E2501 -,F0,). (35) 
/ ps 
We also require the following property of the gamma function. If} is a non-negative integer, 


(b+1) , 1 1 
Peat 7 it+gt-- +57” (36) 





where y = 0-577215... is the Euler-Mascheroni constant. (See, for example, Jeffreys & 
Jeffreys (1946, §15-04).) It follows from (10) and (36) that 








d 1 1 1 Pp 
on = o Pe =a Wau r 
Ho (|B) (; =P (1—P,) Griteeat tye te) 





= &(n,|H)|— +. +t (lee | )}. 








5) 


5) 
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by (35). Therefore 


1 1 d 
8 (0, | H)( 5 +--+ 5pm) — 5 | H)— Fy | log (1 —9,) | 


= &(n, | H)&(—logg,| H) = (*) XP.(1—p,)"* (—log p,), 
B 


by (35) again. Multiplying by r/N™ and summing with respect to 7, we find that the 
right-hand side of (34) equals 


N-—m 


as asserted. 
2 
Cm,2 can be evaluated in a similar manner by first writing down (5) &(n,| H), but the 


result is complicated and will be omitted. 
As in the estimation of c,, 9, if the statistician is satisfied with his smoothing, then he can 
write 1 


753 


1 rie RS ee 
cma Fa EAM +... +p 7 Gp 08 Mr Flog (1 —4q,) | H}}. 


If N is large the approximation can be written 








1 1 1 d 1- 
sa atl al Ca atl i =A} 
; feed 
is ail r ; r*—r Jr 
Now it is intuitively clear that & tH , which equals Vv must be O y) = o(NV), and 
therefore 
a yr logn—(1454...++ 4 hog nt 
Cm,1= Fyn +\ log N — et: tae — 7, 08 
1 , d ; 
= mologN — 5 Dr"e(9, + log ne) (37) 
1 1 
where 9, = l+5+..+25-7- (38) 


In particular, the entropy c, ,=é, ,, where 
1 - d : 
G11 a log N— 5, Zrni(, +5, log m . (39) 


The differentiation can be performed graphically for all r or by numerical differentiation for 
r = 3,4,5,.... (For numerical differentiation see, for example, Jeffreys & Jeffreys (1946, 


§9-07).) Another estimate of the entropy is é, 1 Where 


A 1 d Z 

41> log N — 5 Erm(o, + 5, log.) . (40) 
in which the ‘prime’ has been omitted from the first occurrence of n; in (39). This estimate, 
é,,,, has leanings towards being an unbiased estimate of the entropy. It can hardly be as 


good as (39) when the smoothing is reliable. Perhaps the best method of using the present 
theory for estimating c,, , is to use the compromise Z,, ,(¢) defined in the obvious way by 
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analogy with (33). For large values of r, the factor g, + £ log n, may be replaced by logr to 


a good approximation. Terms of é,, ,(¢) for which this approximation is made, i.e. terms of 
the form rn, log r may be regarded as crude and unadjusted. 


7. Special hypotheses, H. In this section we shall consider some special classes of hypo- 
theses, H, which determine the distribution Pp, So far we have taken this distribution as 
discrete for the sake of logical simplicity. In the present section we shall find it convenient 
to assume that there is a density function, f(p), where f(p) dp is the number of species whose 
population frequencies lie between p and p + dp. (The formulae may of course be generalized 
to arbitrary distributions by using the Stieltjes integral.) Clearly 


1 
I, f(p)dp = s, (41) 


1 
I Pf(p)dp = 1. (42) 


The expected value of p for an animal at random from the population is 


1 
&(p|H) = i _P4(D) dp = Cro (43) 


The appropriate modifications of the previous formulae are obvious. For example, instead 
of (10) and (20) we have 


1 
éy(n,|#)= (7) | wr—pys\v)dp, (44) 
yy\riei 
Sy(nt| H) = by(m,|H)+("",) [[ rara—p—a"*f ofa) dpdg 


- (;.,) | 20 — 2p)-* fp) dp. (45) 


r, 


Notice the elegant checks of (44) and (45) that &(n)| H) = s, &(n,| H) = 1, (nm | H) = 0, 
V,(n, | H) = 0. Formula (44) leads to the less precise but often more convenient formula 


éy(m,|#) = =[1+06%) |[° owe ™stp)ap 


a5. (pN ye?’ f(p)dp (r? = o(N)), (46) 


while a similar treatment of formula (45) leads back merely to formula (22). 

We shall now list a number of different types of possible hypotheses and then discuss 
them. The normalizing constants are all deduced from (42). 
H, (Pearson’s Type I): 

Pd ed.) je 
Ke) = Gai M- PY (a> -1,8>-0). (47) 
H, (Pearson’s Type III): 
fe 


2 
f(p) = aye” (a> —1,8>0). (48) 




















h 


Ss eS 


me 


2) 


3) 


4) 
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H, (same as H, but with a = —1): 


f(p) = Bp*e*?  (B>0). (49) 

Ay: f(p) = sAp*exp(—fp—ep) (B>0,e>0). (50) 
a yee 
H, (truncated form of H;): f(p)= & : ‘ : 9 (51) 
H, (truncated form of another special case of H,): 
p e~hp 
we [Fe (p> Po), (52) 
0 (P< Po); 


where E(w) = — Ei(—w) = [rw edu. Ei(w) is known as the ‘exponential integral’ and 
w 


has been tabulated several times. (For a list of these tables see Fletcher, Miller & Rosenhead 
(1946, §$13-2 and 13-21).) 

We list also a few less completely formulated hypotheses, H,, H, and H,, for which the 
population is not explicitly specified, but only the values of &y(n,|H). Hence for these 
hypotheses the parameters may depend on JN. 

H, (Zipf laws): @(n,|H,)ocr- (r>1,€>09), (53) 
where ¢ is often taken as 2 by Zipf. (See also (94) below.) 


H, (H, with a convergence factor): 


&(n, | He) =" (r>1,€>0,0<2<1). (54) 
H, (a modification of a special case of H,): 
Az’ 
TAR oa (r>1). (55) 


We now discuss the nine hypotheses. 
(i) H, has the advantage that the exact formula (44) can be evaluated in elementary 
terms. We can see from (41) and (43) that 


2 
&(p|H,) = a2+2  (a+2)(a+f+2)1 (57) 





a+B8+3  (~+1)(a+P+3) 8° 
In most applications we want f(p) to be small when p is not small and &(p| H) to be large 
compared with 1/s. Hence if a hypothesis of the form H, is to be appropriate at all, we shali 
usually want f to be large, by (47), and « to be close to — 1, by (57). 
By (44) we see that 
i (a+8+2)!(a+r)!(2+N—r)! 
xit| Hi) =(")) (a+1)!BI(a+B+N+1)! ° (58) 
Hence, by (2’), if the smoothed values n; and n;,, were equal to their expectations, given 
H,, we would have 








_ (a#+r+1)(N—-r) 
i; B+N-r 

(ii) H, can be regarded as a convenient approximation to H, if #>0. Strictly, the 
hypothesis H, is impossible since it allows values of p greater than 1, but it gives all such 


r* 


(59) 
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values of p combined a very small probability provided that f is large. H, was used by Green 
wood & Yule (1920) and by Fisher (see Corbet et al. 1943). We have 


o=, o(p| my =P eS (60) 


so that « must be close to —1. Hence, if r? = o(N), 


Ey(m, | Ha) = gs ian (= 4) (4). (1) 





which is of the negative binomial form. 


(iii) Of all hypotheses of the form H,, Fisher (Corbet et al. 1943) was mainly concerned 
with H,, the case a = —1. (See example (i) in §8 below.) Then 


e= 0, &(p| Hh) = 5, (62) 
N r 
ulm, |H)=2 (a) == (rt = oy, (63) 


say. For large samples, x (which, unlike £, depends on N) is close to 1 and the factor a” may 

be regarded as a convergence factor which prevents > éy(n, | H;) from becoming infinite. 
r=1 

The convergence factor also increases the likelihood of being able to find a satisfactory fit 


to given frequencies, n,, merely because it involves a new parameter. 
We see from (22) that 


l. £2 N \r 
Vy(n, | H3)=&y(n, | H,){1 - 5 *) ew) . (64) 
If fr = o(N) it follows that 








Vrlm | Hy) 1 (). (65) 


6y(n,[ Hy) = 2*#\ 


Thus in these circumstances Vy(n, | H,) lies between the bounds given by (26) and (27), being 
for each r about twice as close to the smaller bound than to the larger one. When applying 
the chi-squared test, where x? is defined by equation (19), we can hardly go far wrong by 
assuming (65) to be applicable whatever the distribution determined by H may be. But, of 
course, we may often be able to improve on (65) when H is specified in terms of the dis- 
tribution of p. For convenience in applying (65) we give a short table of values of 


[aT a 











r k, r k, r k, 

1 1-33 6 1-13 1l 1-10 
2 1-23 7 1-12 12 1-09 
3 1-19 8 1-11 13 1-09 
4 1-16 9 1-11 14 1-08 
5 1-14 10 1-10 15 1-08 


























For larger values of r, the approximation 1 + 1/{(2.,/(7r))} is correct to two places of decimals. 





3) 
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Suppose we are given a sample of size N and we wish to estimate # and x. The method 
used by Fisher was to equate the observed values of Zrn, = N and Xn, = S to their expected 
values. (Note that S is the observed number of species and should not be confused with s.) 
This led him to the equations 


— flog, (l-—xz)=8S, N = fa/(1—z), (67) 


N x 


5 ~ —(-a)lg, (1-2) we 


which he solved by using a table of z/(1 —) in terms of log,,)(N/S). 

A theoretically more satisfactory method of estimating # and x would be by minimizing 
x?, defined by (19), with r = oo. This method leads to equations which would be most 
laborious to solve by hand but which will be given here since large-scale computers now 
exist. To prevent misunderstanding we mention at once that Fisher obtained a perfectly 
good fit by the simpler method, in his example, i.e. example (i) of §8 below, though, as 
pointed out in §8, H,; must not be too literally regarded as true. 

By (65) we may write 


eo “Al rn2 
x= 54 (F -_ 2n, +5) ° (69) 
The equations giving # and z will then be 
fPXk, 2" /r = Irk, nea, (70) 
PPXk, a = Urrk, nz, (71) 


and these equations could be solved iteratively. 
When £ and z are specified the cumulative sums of &y(n, | Hs) can be found by making use 
of the approximation ,,, 4 


+ 1 
% FH -rlog,2) +5 (1+ tog2— 52), (72) 


which will be a very good approximation if the terms involving } log x and - are negligible. 


This approximation can be obtained by means of the Euler-Maclaurin summation formula. 
(See, for example, Whittaker & Watson (1935, §7-21).) 


(iv) Wehavejustseen that whena = — lin H, weobtains =o0o and of course &\(n,|H) =00. 
There are strong indications in examples (ii), (iii) and (iv) of §8 that we may wish to take 
a< — 1, and then even worse divergencies occur. For example, if a = — 2 we would obtain, 
from (61), the intolerable result 


Ey(ny | A)/Ey(ns | H,) = ©. 


In order to avoid these divergencies we could in theory use hypothesis H,, with a small value 
of «. Unfortunately, this hypothesis seems to be analytically unwieldy; it is mentioned 
partly for its interest as intermediate between Pearson’s Types ITI and V. 

(v) Another method of avoiding divergencies is to use truncated distributions. These 
truncated distributions are not theoretically pleasing but at least some of them can be 
handled analytically. H, is a truncated form of H;. We may describe p, as the smallest 
possible population frequency of any species. In ‘most applications it would be difficult to 
obtain a sample large enough to determine p, with any accuracy. In fact if the estimate of 
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Po were to be reliable the sample would need to be so large that , would vanish for all small 
values of r. In the examples of §8, n, is always larger than any other value of n,, so these 
samples would need to be increased greatly before one could expect even n, to vanish. 


We obtain from (41) se BE( py A). (73) 


w we 


ao*3I3°°” (74) 


Now E(w) = —y—log,w+w—- 


an equation which is undoubtedly well known. It can be proved, for example, by using 
Dirichlet’s formula for y. (See, for example, Whittaker & Watson (1935, § 12-3, example 2).) 
In particular, if w is very small, 

E(w) Oa iz. log, (y'w), (75) 


where y’ = eY = 1-781072. (76) 
(Cf. Jahnke & Emde (1933, p. 79), where our y’ is denoted by y.) Since pp is e2sumed to be 
sai acl s=—flog(poy'B), Poh ter. (77) 


On applying equation (46) we see that 
al(*_V 2 
brine | He)“ (sg) (r> A,r? = o(W), (78) 


Ey (Mo | Hs) = BE[ po(N + £)]= — Blog [Ayy'(N + £)).- (79) 
The check may be noticed that equations (77), (78) and (79) are consistent with 


e = &(8| H,) = E Son, Hy). 


Formula (77) is of some interest, but in most applications both p, and s will be largely 
metaphysical, i.e. observable only within very wide proportional limits. 


(vi) The difficulty of determining p, would not apply to the same extent if a = — 2, i.e. for 
hypothesis H,. (This hypothesis is fairly appropriate for example (iv) of §8.) We have, 
by (46), IN 
Sy(M | Hq) =AxE[ po + N)]= —Axlog?2—, (80) 


Ey(n, | Hy)= (r>2,r? = o(N)), (81) 


._ a 
r(r—1) 
where z and A, unlike # and py, depend on N and are given by 
N 





i B+f N+ ae [- _N+ 
~— — E(poB) log (poy PPo=°*P| -Y¥-—Q] 7 


If A and x can be estimated from a sample, then / and p, can be determined by (82) and (83) 
and s can then be determined from (41), which gives 


8+ 8 = e-PoF|[ py H( po f)\=e-Po4/[ — po log, ( poy’ A)). (84) 
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In order to estimate A and x from a sample, one could minimize x”, more or less as described 
above for H;. For this purpose and for others it may be noted that, by (22), 





Vuln, | He Exe | He)=1-—— 5 ("") (v2) (85) » 
Val | He) Ena | He) yy (86) 
-—_ it set. (87) 


By comparing (85) with (65) we can get an idea of the smallness of the error arising when 
calculating x? if (65) is used for hypotheses other than Hj. 

Another method of estimating A and z, rather less efficient, but easier, is the one analogous 
to that used by Fisher for H,, namely, we may assume that the expected values of N —n, 
and of S —n, are equal to their observed values, i.e. 





S—n, =A —— = Afr (1—2)log, (1—2)} = A(1—e-¥ — Ye-*), (88) 
r=27(r7—1) 
N-2,=Az = = —Aalog, (1-2) = A¥(1-e-¥) = AY, (89) 
r=2/ 
(S—n,)/(N—n,) = Y-*—(e¥ —1)7, (90) 
where = 1—e-Y¥. We may solve (90) iteratively, for Y,i.e. Y = lim Y,, where Y, = 0and, 
n> oO 
for n = 1, 2,3,..., S—n 
-1 — ey aa | 
Yow = W-a,** 1)". (91) 


When A and ~ are specified, the cumulative sums of &y(n, | H,) can be found by making 
use of the approximation 


ter l 
xy e-1) aia Ses 1) log, x] — E( - rlog.2) +5 (1 + tlog,2—5) . (92) 


which will be a very good approximation if the terms involving 4 logz and a are negligible 


(cf. equation (72)). If (1—2)r is small while r is large, then we can prove the following 
approximation: 


t>r 1 
¥ &y(m| Hy) Aer — (1 —a) [1 ~y log, (1-2) —log,r]}. (93) 
If 1—z is small but (1—2)r is large, then 
t>r Aat® 
~ Ey(™ | He) = (l—z)r?" (93.A) 


When in doubt about the accuracy of (93) and (93 A) it is best to use (92), the calculation of 
which is, however, ill-conditioned, so that the error integrals may be needed to several 
decimal places. 

Biometrika 40 17 
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(vii) We now come to the ‘less completely formulated’ hypotheses. H, is discussed by 
Zipf, especially with ¢ = 2 and also in the slightly modified form 
E(n, | H)oc(r?-})-7. (94) 


(See Zipf (1949, pp. 546-7), where there are further references, including ones to J. B. Estoup, 
M. Joos, G. Dewey and E. V. Condon.) Yule (1944, p. 55) refers to Zipf (1932) and objects 
to Zipf’s word distributions on two grounds. First Yule asserts that the fits are un- 
satisfactory, and secondly he points out that (in our notation) 


N = 6&y(Zrn,|H,)=0 if 1<f<2, 


; 1 2r(r—1 
while = ¢,9 = {, P*f(p)dp = &(p| H;) = a( way 





#,) =o if 2<¢<3. 


(viii) Yule’s second objection to H, can be overcome by introducing a ‘convergence 
factor’, x’, giving H,. If H, is any good at all for any particular application then x will be 
fairly close to 1. It would be of interest to specify H, in terms of a density function, f(), 
by solving the simultaneous integral equations 


Aer 1 
“ al, (Npye-*Pf(p)dp (r = 1,2,3,...). (95) 


If € = 1, then H, reduces of course to H;. 


(ix) H, is of interest mainly because it works so well in examples (ii) and (iii) of $8. 
Besides its formal similarity to H, with ¢ = 2, H, also resembles H,, in virtue of equation 
(81). A disadvantage of not specifying f(p) is that Vy(n, | H,) cannot be conveniently worked 
out from (22), though it can always be estimated from (23) with considerably more work. 
Moreover, a correct specification of f(y) is more fundamental than that of the expected 
values of the n,’s and is more likely to lead to a better understanding of the structure of the 
population. 

In order to estimate A and x from a sample, we could use either of the two methods 
discussed for H, and H,, except that in the method of minimizing x? it would perhaps be 
best to guess a formula for Vy(n,| H,), after experimenting with formula (23). We shall not 
discuss this method further in this section. The second method consists in determining A and 
x from the equations 


om A 
ale? f= = —= [x +log, (1—2)], (96) 
a A 





S _ x+(1—z)log, (1-2) 
N = —a—log,(1—2z) ° (98) 


x can be determined either by tabulating the right-hand side of (98) or by writing = 1—e-¥ 
and determining Y from the equation 


Y-1 = (1—e-¥)-1—(1+.8/N)>. _ (99) 
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Y can be found iteratively by writing Y = lim Y,,, where Y, = 1+ N/S,and,forn = 1,2,3,..., 


Yx}, = (1—e-¥a)-2~—(14.8/N)-. (100) 
xN 
Y-x 
Having determined A and x we may wish to test how well H, agrees with the sample. For 
this purpose we need to calculate cumulative sums of the expectations of the n,’s. This can 
be done by means of the approximation 

t>r gt 

¥ t(t+1) 


Then, by (96), we can find A from A= 





(101) 


1 # 1 
E( —rlog,x)—~ E[— (r+ 1) log, x] + Or(r+1) (1 + tlog.2- 5.) » (102) 
deducible from (92). If (1—z2)r is small while r is large, then we have the following 
approximation: 


t>r 
5 E(n,| Hy) = Ax ~~ (1-2) [1 — y—log, (1-2) —log,r]}, (103) 


deducible from and of precisely the same form as (93). An idea of the closeness of this 
approximation can be obtained from example (ii) below. If 1— x is small but (1—2)r is 


large, then pata 
Aart 
~ E(m | H,)=" 


1—<«)r?" 


(103 A) 


When in doubt about the accuracy of (103) and (103 A) it is best to use equation (102). (See 
the remarks following equation (93 A).) 


8. Examples. In each of the four examples given below we use at least two different 
methods of smoothing the data. One of these methods is, in each example, the graphical 
smoothing of ,/n, for the smaller values of r and another method is the fitting of one or other 
of the nine special hypotheses of §7. The discussion of these examples is by no means 
intended to be complete. 


Example (i). Captures of Macrolepidoptera in a light-trap at Rothamsted. (Summarized 
from Williams’s data (Corbet et al. 1943).) N = 15,609, S = 240. 








iv 
r N, niv r Nn, niv r Ny (s ed)t 
1 35 40. ll 2 3-5 21-30 18 15-5 
2 ll 20-0 12 2 3-2 31-50 16 18-0 
3 15 13-2 13 5 3:0 51-70 17 11-4 
4 14 9-9 14 2 2-8 71-100 8 11-2 
5 10 7-9 15 4 2-6 101-150 9 11:8 
6 ll 6-6 16 3 2-4 151-200 7 7-4 
7 5 5-6 17 3 2-3 201-500 12 161 
8 6 4:8 18 3 2-1 501-1000 6 4:6 
9 4 4:3 19 3 2-0 1001-00 1 0-9 

10 4 3-9 20 4 1-9 2349 1 oa 



































+ In future tables this word ‘summed’ will be taken for granted and omitted. 


We now present the results of the calculations; followed by comments. (The columns 
headed ni" in the table above are explained in these comments.) 


17-2 














256 | The population frequencies of species 









































r n, nt ne n" nv r* r** ttt ttt 

1 35 35 35 35 40 1-1 1-4 1:3 1 

2 11 19-4 24-0 22-5 20-0 2-1 2-3 2-2 2 

3 15 13-7 18-1 16-3 13-3 3-0 2-9 3-0 3 

4 14 10-2 13-1 12-3 10-0 3-8 3°8 3°9 4 

5 10 7:8 10-2 9-7 7-9 4:8 4:8 4:8 5 

6 11 6-3 8-1 17 6-6 5-9 5-9 5-5 6 

7 5 5:3 6-8 6-0 5-6 — — _ _— 
ae 





The function nj was obtained by plotting ,/n, against r for 1 <r < 20 and smoothing for 
1 <r<7 by eye, holding in mind the method of least squares. (See note (i) of §3.) n? was 
obtained in the same way, but an attempt was made to keep away from the graph of n;, 
(except at r = 1) in order to find out how different a smoothing was reasonable. Next n,’ was 


obtained by smoothing the cumulative sums > ém,. Finally, ni¥ is the function obtained 
t=1 


by Fisher, i.e. using our hypothesis H, (equation (63)) with = 40-2 and x = 0-9974. A more 
complete tabulation of ni” is given in the first table. The ‘summed’ values of ni” were 
calculated by means of equation (72). No statistical test is necessary to see that the fit of 
ni’ is very good. The values of r* corresponding to the four smoothings of the data are 
denoted by 1*, r**, r*** and r**** respectively. (Logically this gives r* two different 
meanings.) (r**** = 0-9974r, by (2’) and (63).) In accordance with §3 we could force the 
r*’s, etc., to be smooth. This has not been tried here. What is clear is that if H, is not accepted 
then most of the values of r*, etc., are unreliable to within about 0-2 or 0-3. The approximate 
values of x? given by (19) with r= 7 and assuming (65) are 10-9, 11-1, 9-4 and 11-7 
respectively. The number of degrees of freedom is somewhere between 6 and 7. It seems 
safe to take it as 5 for nj”, 6 for nj and nf and 7 for n!”. None of the values of y* is particularly 
significant, though all are a bit large. The data can be blamed for the largeness of the values 
of x, since n, is obviously much smaller than it ought to be. Of the four smoothings Fisher’s 
seems to be the most likely to give the best approximations to the ‘true expectations’. 
There is hardly anything to choose on the evidence of the sample, but Fisher’s smoothing 
has the advantage of being analytically simple. 

The most definite result of interest in this example does not depend much on thesmoothing, 
namely, that the proportion of the population not represented by the species in the sample 
is about (35 + 5)/15,609. For the ‘ + 5’ see formula (65). Perhaps this standard error should 
be increased slightly, say from 5 to 8, to allow for the preference given to ni’. 

Formula (77), if it is applicable (i.e. if the truncated form, H,, of H, is assumed), may be 
written —logi9 %) = 1-:18+0-011s, so that if s were say 1000, then the smallest population 
frequency would be about 10-1*, This is mentioned only for its theoretical interest: it is an 
unjustifiable extrapolation to suppose that the distribution defined by H, would stand up 
to sample sizes large enough to demonstrate clearly the values of s and p,. N would need to 
be of the order of 10/9. The proposition which is made probable by the actual sample is that 
H, and H, (with the assigned values of the parameters) would give good fits to the values of 
n, on other independent samples of 16,000 or less, i.e. that H, and H; provide good methods 
of smoothing the data. The cautious tone of this statement can be more fully justified by 
the following considerations. 
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If H; were reliable then it should be possible to use it to estimate the simpler measures of 
heterogeneity, such as c, 4. Now we can see by (30) that ¢, ) = 0-03935 and é, »=0-0035. 
(For the calculations, the complete data given by Williams must be used.) Hence, by (304), 
it is reasonable to write c, = 0-03935 + 0-0007. Let us then see what value for c, 5 is implied 
by H;. We have 

Ur(r—1) nl¥ = BX(r—1)a? = Bx?/(1—x)? = 0-0243. 


[As a check, fe pt(p)dp = B "pew dp = Bl” ge-eag=p- = 0-025.] 
« Do 


Clearly then H; cannot be used to estimate c, 9. It would be true if misleading to say that 
H, is decisively disproved by the data. Similar remarks would apply in the examples below. 


Example (ii). Eldridge’s statistics for fully inflected words in American newspaper English. 
Eldridge’s statistics (1911) are summarized by Zipf (1949, pp. 64 and 25). We givea summary 
of Zipf’s summary in column (ii) below; more fully in the second table. N = 43,989, S =6,001. 

In this example the values of n, for r < 10 are much larger than in example (i), so we have 
far more confidence in the smoothing that is independent of particular hypotheses. We shall 
present some of the numerica: calculations in columns and then make comments on each 
column. We may assert at once, however, by equations (7), (8) and (9), that the proportion 
of the population represented by the sample is close to 1—n,/N = 14/15. Ifa foreigner were 
to learn all 6001 words which occurred in the sample he would afterwards meet a new word 
at about 6-7 % of words read. If he learnt only S—n, = 3025 words he would meet a new 
word about 11-6 % of the time. The corresponding results for word-roots rather than for 
fully inflected words would be of more interest to a linguist. 








(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) 
r Ny Nn, b, say —A rb, rb; say b; b? ny r* 
1 2976 54-5 54-5 21-8 54:5 | 54-5 54:5 2976 2961 0-73 
2 1079 32-7 32-7 10-0 65-4 65-4 32-7 1079 1075 1-4 
3 516 22-7 22-7 5-7 68-1 67-8 22-6 511 509 2-4 
4 294 17-1 17:0 2-8 68-0 70-2 17-5 206 305 3-4 
5 212 14-6 14-2 1:8 71-0 72-6 14-5 210 209 4-4 
6 151 12-3 12-4 1-5 714-4 74-7 12-4 154 153 5-4 
7 105 10-2 10-9 1-3 76-3 76-3 10-9 119 118 6-2 
8 84 9-2 9-6 1-2 76-8 76-8 9-6 92 91 — 
9 86 9-3 8-4 1-1 75°6 75°6 8-4 71 70 — 

10 45 6-7 7-3 a — —— — — — — 









































(i) and (ii). We first consider the values of r only as far as r = 10. For larger values of 
r the smoothing could be done by using k-point smoothing formulae with k=2.,/r. 

(iii) Each entry in this column has standard error of about }, so one place of decimals is 
appropriate. 

(iv) This column was obtained by smoothing a graph of column (iii) by eye. Experiments 
with the five-point smoothing formula did not give quite as convincing results. For the 
five-point smoothing formula, see, for example, Whittaker & Robinson (1944, §146). For 
the present application it would be ./n; = ,/n, — #A*(./n,) (r = 3,4, 5, ...). 
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(v) This column of differences is given as a verification of the smoothness of column (iv). 
In fact minor adjustments were made in column (iv) in order to improve the smoothness of 
column (v). 

(vi) The numbers 6, of column (iv) are roughly proportional to r-!. This fact suggests that 
rb, should be formed and smoothed again in order to improve the smoothing of ./n, still 
further. This process is of course distinct from assuming that rb; should be constant, where 
the function 6; is a smoothing of the function 8,. 

(vii) and (viii) These columns have already been partly explained. The purpose of this 
improvement in the smoothing is more for the sake of the ratios nj,,/n; than of the n,; 
themselves. 

(ix) Where the smoothing of ./n, had no noticeable effect we have taken 6; = n,. It is 
clearly typical that 5}? = n,, since the eye-smoothing is unlikely to affect n, convincingly. 
Therefore if the smoothing is tested by means of a chi-squared ‘est it will be reasonable to 
subtract about two degrees of freedom. 


9 9 
(x) We have scaled up column (ix) so as to force } rn; = }rn,. We can then assume 
r=1 r=1 


9 
N’ = N, convenient for applications of §6. Note that > k,(nj—n,)?/n; = 6-5, so that x, 
r=1 


given by (19) and accepting (65) as a good enough approximation, is not significant on eight 
degrees of freedom. Thus our smoothing is satisfactory, though there may be other satis- 
factory smoothings. 

(xi) r* is obtained from formula (2’). The larger is r the larger is the standard error of r*. 
We may get some idea of the error by means of an alternative smoothing. The standard error 
of 1* can be very roughly calculated by an ad hoc argument, inapplicable to say 5*. We may 
reasonably say that the variance of 2n,/n; with respect to all eye-smoothings will be about 
the same as that obtained by regarding n, and n; as independent random variables with 
variances circumscribed by the inequalities (26) and (27), or nearly enough, defined by (65). 
Now if w and z are independent random variables with expectations W and Z, we have 
4(2) _ ow_ Wk 

z s... 
and hence, to a crude approximation, 

w\ Vi(w), W*V(z 
v(2)= ihe ma ) 


V(w/z) V(w) , V(z) 

















1.e, WB Ww FR’ (104) 
V(i*) V(2ns/n;)_ 1 1 

It follows that 19 = niin)? kni tim? (105) 

so that V(1*) = 0-73? x 0-0010 = 0:00052 and 1* = 0-73+0-023. 


(xii) (see the second table). An analytic smoothing which is remarkably good for r < 15 is 
given byn; = S/(r?+r). For larger values ofr there is a serious discrepancy, since > Nn, = 374 
r=16 


@ 
while  n,= 297. It is clear without reference to the sample that nf cannot be satisfactory 
r=16 


for sufficiently large values of r, since Xrn? = co instead of being equal to N. 
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(i) (ii) (x) (xii) (xiii) (xi) (xiv) 
r Ny ny ne e r* r** 

1 2976 2961 3000 3008 0-73 0-67 
2 1079 1075 1000 1002 1-4 1-5 
3 516 509 500 500 2-4 2-4 
4 294 305 300 301 3°4 3-3 
5 212 209 200 201 4-4 4-3 
6 151 153 143 144 5-4 5-2 
7 105 118 107 108 6-2 6-2 
8 84 91 83 84 — 7-2 
9 86 70 67 67 —- 8-2 
10 45 ane 55 55 -- —_— 
11-15 156 — 170 170 — — 
16—20 76 _ 89 89 — — 
21-30 78 oe 92 92 — — 
31-40 34 a= 47 47 — — 
41-50 28 — 29 28 — —- 
51-60 10 a 19 19 — — 
61—co 71 — 98 90 —- ae 
4290 1 —- — — — 


























(xiii) The fit can be improved by writing n,’ = Az2’/(r? +r) as in equation (55), i.e. using 
hypothesis H,. We find by equations (100) and (101) that A = 6017-4 and x = 0-999667. 
Column (xiii) can then be easily calculated directly for r < 10 and by use of (102) or (103) 
for r > 10. ((103) gives the correct values for ng and nj, to the nearest integer, and it gives 
» n; = 89-96, as compared with 89-90 when (102) is used.) Note that 3, mn; = 365, which 
ities an improvement on n; but is still significantly too large. A better fit could be obtained 
by the method of minimum y? or by using some simple convergence factor other than 2’, 
such as e-@"—r* with a>0, b>0. 

(xiv) r** is defined as (r +1) ,,,/n, and is equal to r(r+1)/(r+ 2). This column may be 
compared with column (xi). The agreement looks fairly good. It is by no means clear which 
of the two columns gives more reliable estimates of the ‘true’ values of r* for r <7. Column 
(x) is a better fit to Eldridge’s data for r < 9 (and could be extended to be a better fit for all r) 
than is column (xiii) but is not as smooth. Columns (xii) and (xiii) would be preferable if 
some theoretical explanation of the analytic forms could be provided. Such an explanation 
might also show why the fit is not good for large r, even with the convergence factor x. The 
limitation on r in equation (46) may be relevant. 

If H, is true, the population parameter c, 9, given by “i can be expressed in the form 

ya 3 = al rs Sn aa = Jog, (1 »)|. (106) 


Formula (106) would give c, 5 = 0-00928, but this value is probably a bad over-estimate 


wh. 
since n;’ is too large for large r and the terms of yal 2— i” for large r make most of the 
contribution. Similarly, é, 5, given by (30), depends Tislate on the larger values of r repre- 
sented in the sample, but Zipf’s summary of Eldridge’s data is not complete enough to 


calculate é, 5. Similarly, assuming H,, the entropy, c,,, could be estimated from equation 
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(39), and this method could be expected to give close agreement with the correct value, 
since c,, does not depend so very much on the more frequent species. But I have not 
obtained a closed formula, resembling (106) for example, and the arithmetic required if no 
closed formula is available would be heavy. The estimation of measures of heterogeneity will 


be discussed again under example (iii). 


Example (iii). Sample of nouns in Macaulay’s essay on Bacon. (Taken from Yule (1944) 


Table 4-4, p. 63.) N = 8045, S = 2048. 



































r Ny r Ny r Ny T Ny r Ny 
1 990 1l 24 21 1 31 2 41 1 
2 367 12 19 22 4 32 1 45 2 
3 173 13 10 23 7 33 1 48 1 
4 112 14 10 24 2 34 1 57 1 
5 72 15 13 25 1 35 1 58 1 
6 47 16 3 26 5 36 1 65 1 
7 41 17 10 27 3 37 1 76 1 
8 31 18 I 28 4 38 2 81 1 
9 34 19 6 29 1 39 4 89 1 
10 17 20 5 30 3 40 1 255 1 











As in example (ii) we can state some conclusions at once, without doing the smoothing. 
If our foreigner learns all 2048 nouns that occur in the sample his vocabulary will represent 
all bat (12-3 + 0-5) % of the population, assuming formulae (9) and (65) or (87). If he learns 
only 1058 nouns his vocabulary will still represent all but (n,+2n,)/N = 19-3% of the 














population. 
Wenow present three different smoothings corresponding precisely to those of example (ii). 
. ” * ** sada é, ny La ny" | g,logice 
? N, n, n, n, r r r ar O810%, ar Bi0%, | JrOLi0 
1 990 990 1024 1060 0-74 0-67 0-66 — 0-50 — 0°65 0-184 
2 367 367 341 350 1-4 15 1-5 — 0-30 — 0°37 0-401 
3 173 173 170 174 2-6 2-4 2-4 —0-24 — 0-26 0-545 
4 112 112 102 103 3-4 3°3 3-3 —0:17 — 0-20 0-654 
5 72 76 68 68 4-4 4:3 4:3 —0-15 — 0-16 0-741 
6 47 56 49 48 5-3 5-2 5-1 —0-12 —0-14 0-813 
7 41 42 35°5 36 6-5 6-2 6-1 —0O11 —0-12 0-876 
8 31 34 28-5 28 7:3 7-2 71 —0-10 —011 0-930 
9 34 27 22-7 22 8-2 8-2 8-1 — 0-09 —0-10 0-978 
10 17 22 18-4 i8 —_ 9-2 9-1 — 0-08 — 0-09 1-021 
ll 24 18-5 15-5 15 —_— 10-2 10-1 _ — wT 
12 19 16-0 131 12 —_ 1l-l 11-0 —_ macs as 
13 10 13-7 11-3 10 —_— 12-1 12-0 _ — or 
14 10 10-9 9-7 9 — 13-1 13-0 —_ oe = 
15 13 9-6 8-5 8 —_ 14-1 14-0 _ a aver 
16-20 31 32-5 30°5 27 — -—— — —_ — —_ 
21-30 31 — 31-5 26 — —_ _— —_ tad i 
31-50 19 cee 25-9 19 —_ — _— _ — 
51-100 6 —_ 19-9 11 _ _ _ — — ra 
101-00 1 — 20:3 3-6); — — —_ — = — 
255 1 — — as a= 254 252 _ om = 









































/n, was obtained by smoothing ,/n, graphically. 





q; 
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n, = S/(r?+r). It is curious that this should again give such a good fit for values of r that 
are not too large (r < 30). The sample is of nouns only and, ‘moreover, Yule took different 
inflexions of the same word as the same. 

ny, = Ax’|(r?+r), where A = 2138-90, x = 0-991074, the values being obtained from (100) 
and (101) as in example (ii). 


15 
The expressions > (n;—n,)?/n;, etc., take the values 9-5, 21-2 and 27-3. The values of 
r=1 


x? would be about 2 or 3 larger. (See (19), (26), (27), (65).) There is no question of accepting 
ny for r > 50 but it is better than n?’ for r < 15. When r < 9 the values of r* and r** (and there- 
fore of r***) show good agreement except forr = landr = 7. If the analytic smoothings had 
not been found, the value of 6* would have been smoothed off, with repercussions on the 
function nj. The discrepancy in 1* must be attributed either to a fault in the value of 
n; (and therefore in H,) or must be blamed on 2, (i.e. on sample variation). If I had not 
noticed the analytic smoothings I would have asserted that 1* = 0-74 with a standard error 
of something like 0-04. (See equation (105).) 

We now consider two of the measures of heterogeneity in the population, namely, c, ) and 
¢,,,- By (30) we can see that é, 9 = 0-00272, agreeing with Yule (1944, p. 57). Also 
és 9 = 0-00003957, so that by (830A) we may reasonably write c, . = 0-00272 + 0-00013. 
Assuming H, to be valid for r < 30, we may also estimate c, 9 by Z, 9 (30) as in equation (33). 
We have, in a self-explanatory notation, 


30» 
@2,0(30 | Hy) = 77m wm {AD 7 tat y 2n, /. (107) 
Now, as in (72), 
© r—] x 2 at 
are = — {Bl 32l0g, 2) +F (1+ tlog.2—rte)] = 82-924, 


But, as in (106), ore —a¥ 


Es 9(30| H,) = 0- 00246, This is about two standard errors below its expected value, based on 
the simple unbiased statistic é,. The discrepancy may again be attributed to the large 
value of nj’. If, instead of n;’, the smoothing n; is accepted for r<30, we would get 
%~9(30) = 0-00267. (It was in order to obtain this comparison that we calculated é, 9(30| H,) 
rather than é, »(50| H,). The fit of n; deteriorates at about r = 30.) 

The last three columns of the table are related to the estimation of the entropy, c, ,. (See 


30 » 1 
= 99-501, so that 2 = 16-577. It follows from (107) that 
1 


equation (40) and the remarks following it.) £ logi9”, was obtained graphically for r = 1, 
2 and 3 by numerical differentiation for r = 3,4,...,10. (The graphical and numerical 


values agreed to two decimal places forr = 3.) The column $ i080 nm, was of course calculated 


as logi9t— (- +5) log,)¢. The crude estimate of the ‘entropy to base 10’ or ‘entropy 
expressed in decimal digits’ is log,,N -y=™ logio7 = 2-968 decimal digits. If n/ 
accepted for r = 1, 2,3,..., 10 we find that 


d 
é,,1(10) = logy, N — ne  H il logi9¢ + 7 logiom, J+ + z rn, logio r| = 3-051 decimal digits. 








262 | The population frequencies of species 


We shall next calculate é, ,(50| H,), using another self-explanatory notation. Since, by 
Jeffreys & Jeffreys (1946, §15-05), 


9,~loger+ 5 —Toa7 


it can be seen that 


2 Ia d - 
é,,1(50 | H,)=logy, N -= x rn, (0 logig¢ + 5 logign, 
N\ 1 d 


r 
50 50 8lo e 50 ro) 
+ Zrnf logyor +logyy2 ¥ rnf —= 812° S nt + 3.1m, log.o?| 
; 11 il il 51 


= 3-192 decimal digits, 


as we may see by means of rather heavy calculations, using the last column of the table, 
together with equations (72), (74) and (92). The crude estimate of c, , is the smallest of the 
three. This is not surprising, since the crude estimate is always too small in the special case 
of sampling from a population of s species all of which are equally probable. 


Example (iv). Chess openings in games published in the British Chess Magazine, 1951. For 
the purposes of this example we arbitrarily regard the openings of two games as equivalent 
only if the first six moves (three white and three black) are the same and in the same order 
in both games. N = 385, S = 174. 








r N, n ne nn r** 
I 126 126 126 126 0-39 
2 22 22 24-6 24 1-0 
3 5 76 8-1 8 2-0 
4 4 4:8 4:0 4 3-0 
5 3 3-2 2-4 2-4 4-0 
6 4 2-6 1-6 1-6 50 
7 0 2-2 1-1 1-14 6-0 
8 3 a 0-85 0-86 7:0 
9 1 — 0-66 0-67 8-0 

10 1 — 0-52 0-53 9-0 

1l 0 

13 1 | 

14 ] 

16 1 3°97 4-80 

23 1 

36 1 

le @) — 


























,/n; was obtained by graphical smoothing of ./n,. 

n; was obtained by assuming H, (see equation (52)), ie. nm? = &g,(n,| Hy), where the 
parameters x and A were obtained from (91) and (89). These gave x = 0-99473, A = 49-635 
and n; for r>2 is then given by (81). Next p, was determined as 0-00011304 = 1/8846 by 
using equation (80). Then (82) gave # = 2-040, so that, in accordance with (52) and (74), 


0-128p%e-20407 (p> 1/8846), 


fp {9 (p< 1/8846). 





oO rm i» =» 


~~ Fr 
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Finally, equation (84) gives s = 1132. This then is the estimate of the total number of 
openings in the population, though the sample is too small to put any reliance in it. 


ny (r > 2) is simply (S—n,)/(r?—r) = 48/(r?—r). 


This is just as good a fit as n;. It gives an infinite value to c, o, but this is not as serious an 
objection as it sounds since H, would also give quite the wrong value for c, 9. (Cf. the 
concluding remarks in the discussion of example (i).) 

We list in the table the values of r* corresponding to n7, calling the values r** in conformity 
with the convention of the present section. Clearly r** = (r—1)x when r>2. Thus the 
average population frequency of the 126 openings that each occurred once only in the sample 
is 0-39/385 = 0-001. 

A player who learnt all 174 openings would expect to recognize about 67 % of future 
openings for the same population, assuming that the sample was random, If he learnt the 
48 openings that each occurred twice or more in the sample the percentage would drop to 
55 % and if he learnt the 26 that occurred three times or more the percentage would drop 
to 49 %. (See formula (6’).) 


9. Index of notations having a fixed meaning. 


$1. N,n, (but see also §2), m9, q,,7* (as a definition of the asterisk, but there is a slight 
change of convention in §8), n; (here again there is a slight change in §8), &( ), V( ). 


§2. 8, Py» (py, Po soon — H, éy, Lr, 4, N- 
g3. N’. 


§5. x, = %,, a, 


§6. Cm, n? Emn,o2 Em, o> Em,o(t)> Y; Jr; G1 Em,1(t). 
§7. P,f(p), Po: H, to A,, E( ), | 8, w. 
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CAPTURE-RECAPTURE ANALYSIS 


By J. M. HAMMERSLEY 
Lectureship in the Design and Analysis of Scientific Experiment, University of Oxford 


This paper presents a new method for the analysis of capture-recapture data. Previous methods 
have employed deterministic or partially stochastic models of the population under study; the new 
method not only provides a fully stochastic model, but it also allows death-rates to depend upon both 
current time and the health and age of the separate individuals. The numerical analysis associated with 
the method is, however, heavy, and can only be recommended when one wants to extract the maximum 
amount of information from data which have required much effort or time to collect. The method is 
illustrated by an estimation of the death-rate of Alpine Swifts (Apus mlba) in the wild state. Some 
thirty years having been spent in collecting these data, it does not seem disproportionate to have 
spent several months in computing the results. The larger part of this calculation was done on ordinary 
desk calculators. 
Some unusual problems in maximum-likelihood estimation are discussed. 


INTRODUCTION 


The literature upon capture-recapture analysis is now extensive.* To acquaint himself 
with it, the reader may consult the references given at the end of this paper, and then follow 
up the references cited in turn in these papers. To make the present paper fairly self- 
contained, however, we begin with a brief outline of the problem. 

The main object is to estimate the (possibly changing) size of a population of individuals 
(e.g. animals, birds, fish, insects, etc.) in their natural state. To this end, the experimenter 
captures individuals from the population on a number of successive occasions. On each 
occasion the catch is to be considered as a random sample of individuals from the population; 
that is to say, each individual in the population (if alive) has an equal chance of being 
captured on any given occasion irrespective of its age, health, type, etc., and of any previous 
captures it may have suffered, although this chance may vary from one occasion to another. 
Each time an individual is caught, a record is marked for it or on it to show the occasion of 
that capture; the individual is then returned to the population. Each individual in the 
population may die at any time, the chance of death being supposed to depend only upon 
the current time, and the current age, type and condition of the individual. The experi- 
menter has to estimate the size of the population at any given instant from records of the 
several occasions of capture. The main difficulty is that this estimation depends upon an 
estimate of the effective death-rate then prevailing in the population, and that this latter 
estimate is hard to make. 

The experimenter may or may not be able to distinguish between one individual and 
another. For instance, in the case of birds, he can place a numbered ring upon the leg of 
each captured bird; and, if all rings bear distinct numbers, he will be able to identify any 
particular bird upon every occasion that it is subsequently recaptured. On the other hand, 
in the case of moths, the marks may be only spots of paint on the wing, the colour and 
position of the marks indicating the dates of capture but failing to discriminate between 
two moths which have both suffered the same history of captures. Merely for convenience 
in describing the theory, we shall suppose that the experimenter has fastened individually 


* Laplace had ideas on the subject 170 years ago, and the last 60 years have produced a steady flow 
of papers. 
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numbered rings upon all individuals at their first capture; but we shall later see that as 
a rule the analysis can still be carried through in the other case. 

Let us then suppose that the experimenter captures individuals at a sequence of discrete 
instants T',, 7), ...,T|,; and that the individuals which are captured at some time or another 
are named J,, J,,..., 1, by means of their distinct rings numbered 1, 2, ...,n. The complete 
set of data then takes the form of an m x n matrix in which the element in row i and column j 
is 1 or 0 according as the individual J, was or was not captured (or recaptured) at the instant 
T;. Call this data matrix D = (d;;),nn- 

There are two ways of analysing D. According to the first scheme, which we call ‘analysis 
by rows’, we first consider any given subset of rows of D, and count the number of columns 
which contain 1 in every row of the subset and zero in every row not belonging to the subset. 
Let us call this the «-number of the subset. We then analyse the set of all «-numbers obtained 
by considering every possible subset of rows. This method leads to very considerable algebraic 
difficulties since the «-numbers so obtained are not independent of each other, and therefore 
cannot be combined in a simple fashion to yield the likelihood of D. Analysis by rows is 
basically the structure which underlies most previous attacks on the capture-recapture 
problem; but, either because of the algebraic difficulties just mentioned or because of 
limitations in the data presented to them, previous authors have adopted one or more of 
the following simplifying expedients: 

(a) Certain subsets of rows are grouped together according to some rule of procedure, 
and the analysis is confined to the totals of the a-numbers falling in the groups thus gen- 
erated. This pooling of «-numbers may perhaps sacrifice some information. Also it will be 
dictated to some extent should the available data not give complete records of individual 
histories. 

(6) No use is made of the a-numbers corresponding to subsets which comprise a single 
row only; that is to say, we disregard those individuals which are captured only once and 
never recaptured. This sacrifices information because these special a-numbers influence the 
estimates of the experimenter’s catching efficiency, on the precision of which estimates 
depends also the precision of the death-rate estimate. Again this will be dictated if the data 
does not contain records of individuals that are never recaptured. 

(c) It is assumed that the total catch at any given instant of catching is a small fraction 
of the population at risk of capture; and that therefore the distribution of the «-numbers, 
which is strictly a multihypergeometric distribution, may be replaced by a multinomial 
distribution. 

(d) It is assumed that death-rates operate deterministically; that is to say, the proportion 
of birds, having any given history of captures, which will die in a specified period of time is 
not subject to random variation. 

(e) It is assumed that death-rate is independent of age and current time. 

The alternative scheme is ‘analysis by columns’. This simple change of viewpoint 
(which, for the analysis of the Alpine Swifts, is in a sense to take a bird’s-eye view of 
the data!) sweeps away all the algebraic difficulties and allows us to avoid, if we so desire 
and if the comprehensiveness of the data permits, any or all of the palliatives (a)—(e) above. 
Although analysis by columns has these algebraic advantages, it has, on the other hand, the 
disadvantage that it leads (as might be expected) to more complicated equations and 
considerably heavier computing than in the case when analysis by rows is undertaken with 
one or more of the assumptions (a)—(e). Analysis by columns invoives finding separately 
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for each column of D the likelihood of the observations in that column. Then the likelihood 
of D is the product of these separate likelihoods; for the histories of two individuals are 
mutually independent under the assumptions made in the second paragraph of this intro- 
duction. The most serious restriction in these assumptions is probably that the chance of 
death depends only upon current time and current age, type and condition of the individual; 
for in reality the death of one individual may affect the chances of another’s survival, as 
in case of over-population. 
One other artifice, which will be fundamental, is the 


THEOREM. Parameter spaces may be transformed before or after maximum-likelihood 
estimation with the same final result in either case. 

This is certainly not an original statement, and it is quite obvious once made. It is, of 
course, false for most forms of estimation other than maximum likelihood. Statisticians, 
to whom I have mentionéd it, fall into two classes of roughly equal size: those of one class 
knew it from their cradles, those of the other show momentary disbelief. I would have 
belonged to the second class had not this capture-recapture analysis forced the theorem 
to my notice. 

The analysis has thrown up a further aspect of maximum-likelihood estimation. Suppose 
that the likelihood L depends upon a parameter 8 with co-ordinates 6,0, ...,0,. Then as 
a rule the maximum-likelihood solution is obtained as the solution of the simultaneous 
ree aL/00,=0 (j =1,2)...,9). (1) 
What should one do, however, if no solution exists for these equations, or (worse) if the 
equations themselves do not exist (for example, the likelihood function may not be differen- 
tiable with respect to 0;)? Even the rigorous text-books seem to overlook this possibility; 
and yet it may arise in the very simplest of practical problems. For instance, suppose that 
we wish to estimate the parameter 7 of a binomial distribution having performed N trials 
of which n are ‘successful’. The likelihood function is 


L= (") m™(1—7)N-, 


and hence 
" —N/(1—7) (n=0), 
1a 
T* (n—Nn)/m(1-7) (O0<n<WN), 
N/|r (n=N). 


When 0<2<WN, we get the usual solution 7=n/N; but when n = 0 or N, the equation 
0L/ém = 0 has no solution. In these cases it seems reasonable to take 7 = 0 and 7 = 1 
respectively; for, subject to the known information 0<7< 1, these values of 7 make the 
likelihood as large as possible. It is easy, however, to construct examples in which no value 
of the parameter makes the likelihood as large as possible. For instance, consider the 
foregoing binomial distribution, when n = 0 and we have the prior restriction 0<7. In 
general the following procedure may be acceptable to some readers, and I, for one, would 
accept it for want of a better. Let 6 be an unknown vector parameter restricted to a known 
set ©, and let © denote the closure of ©. Since the likelihood function satisfies 0 < L <1, it 
possesses a supremum L for fixed data and 8 belonging to ©. Let ¢ > 0 be prescribed and let 
O(c) denote the closure of the set of all @ in © such that L > Z—e for the fixed data. Then 
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O(e,) > O(e,) whenever e, > €,; and hence @(0) = lim O(c) exists in 0. It is to be noticed that 
e>0+ 


©(0) may not have any pcints in @, it may not be a unique point or even a countably infinite 
set of points, and that the likelihood for any 6 in 0(0) may be less than every L for 6 in ©. 
It is, indeed, possible to construct an example in which the likelihood is not constant and 
yet, for almost all data, every point of the parameter space is simultaneously a maximum- 
likelihood and a minimum-likelihood solution; for example, it suffices to take L equal to 
a positive constant or zero according as 6 is rational or not, except that L is to be suitably 
modified for one particular value of the data (thus preventing a solution which is independent 
of the data). Nevertheless, I should be prepared to define (0) as the maximum-likelihood 
solution, mainly on the grounds that it reduces to the conventional maximum-likelihood 
solution whenever this latter exists, and that in many other cases it seems to produce the 
sensible answer. The problem now remains of specifying some expression for the variance 
of this estimate; but I have no satisfactory idea how to set about this. In the first place one 
has to define what may be meant by the variance of an uncountable statistic. Secondly, 
even when the statistic is unique, one has to determine what should replace the conventional 
expression 0*log L/00;00;. Presumably if one can calculate a variance-covariance matrix 
V for any unique 8, then what is wanted in general is an expression of the type MV, where 
M is a generalized averaging over ©(0). Thus, if ©(0) is measurable and not of zero measure, 
M might denote normalized integration with respect to this measure. Alternatively, one 
might prefer an averaging defined as the extraction of a supremum or infimum; and a 
number of other possibilities will suggest themselves to the reader. The only precaution to 
observe is that the averaging shall be regular in the sense that it is an identity mapping for 
a point estimate 6; for then the definition of the variance-covariance matrix will include the 
conventional definition. There is a further point to watch. Suppose equations (1) are soluble 
in the ordinary sense except for j = 1. If we now estimate 0, by the foregoing general tech- 
nique and substitute this solution into the remaining equations, then some or all of these 
remaining equations may no longer be soluble. We may expect this to happen when the 
estimate lies on the frontier of parameter space; and we shall have an instance of it when we 
come to consider the data for the Alpine Swifts. 


A STOCHASTIC MODEL FOR CAPTURE-RECAPTURE ANALYSIS 


Let p; = 1—gq; denote the probability that « particular individual J;, known to be alive on 
occasion 7;, is captured (or recaptured) on that occasion. Since (by hypothesis) this prob- 
ability is constant for all individuals at risk of capture, p, is an index of the experimenter’s 
catching efficiency on the occasion 7;. Let ¢;; = 1—y,; denote the probability that the 
individual J; is alive on the occasion 7, ,,, given that it is alive on occasion T;. Let u,; denote 
the probability that the individual J; is recaught on some occasion subsequent to 7;, given 
that it is captured (or recaptured) on occasion 7. Then we have the fundamental difference 
i uunges Mis = Dis Pisa t VisrMiss9)3 Mg = 9. (2) 
The boundary condition 1,,; = 0 holds because the experimenter makes no further catches 
after the final occasion 7,,. For a proof of the difference equation itself, suppose we know 
that J; has been caught on occasion 7;. Then we know it is alive on occasion T,. If it is re- 
caught, it must certainly survive until occasion T;,, at least, of which event there is prob- 
ability ¢;;; and then it must either be recaught on occasion 7;,, given that it is then alive 
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(with probability p;,,), or it must escape capture on occasion T;,, (with probability q;,.) 
and then be captured on some subsequent occasion given that it is alive at T;,, (and the 
probability of this is ~,;,, ;). The ‘either-or’ of the previous sentence is exclusive; and (2) 
follows at once. Now from (2) 


Pig = Dis Pisa t Vera) > Pig Pisa + UrMirs,s) = Mag 
and Big = Pighissg + Pig Pisa] — Miss, 5) 2 Pig Miss,y> 
because the various probabilities lie between 0 and 1 inclusive. By putting p;,, = 1 and 0 
respectively in these inequalities we deduce that 
Pig > Mig > Dig Mis,5 (3) 


is the best possible set of inequalities operating upon the ~’s. Hence (3) defines the para- 
meter space p. The parameter space ¢ is evidently the closed unit hypercube. From (2) 
= 


we also have 
Mi-1,5— Pi-1,5 = Pir, Pi— 1 + Mag) = Gi Ul — 1 + yy); 








and so bia = PMS = 2,3,...,m). (4) 
1— My; 
iin Pi _ Pi-1,5— 9i-1,5%i _ Pir, — Mis) — (Gia, — ia, 
qi Pi-1,5%i Pi—-1,5— Pa-1,3 
= Mins Pins (i = 2,3,...,m). (5) 
Pi-1,5-Mi-rj 


Suppose J; is captured on occasions 7,,, T), ..., Tg and on no other occasions, and that we 
consider only information upon J; on occasions from 7, onwards. The likelihood is then 


VaVa+1--: 16 
4a a+1**46()_ yy, (6) 
9aUp +++ % (1 — Hos) 


} oe at Ce Oe (7) 


L; = $2j Pa+1,j oo $o-1,jPaPp +++ Po 
The likelihood of D will now be 


If we wish to take into account the possibility that J; has remained uncaught in the popula- 
tion from some unknown instant of entry K(j), then L; must be multiplied by 


rw, Px 4t,j ve $a-1,591K)IKG)4+1 +++ Ua 


in which K(j) is a parameter to be estimated (or a random variable to be integrated out). 
The maximum-likelihood solution results from choosing, subject to (3), the available 
parameters so that L is as large as possible. Before we can do this, however, we must by 
some assumption reduce the number of unknown parameters; for at present there are more 
unknown parameters than observations (i.e. elements of D), and hence the parameters 
cannot be uniquely determined. The analysis will, moreover, not be tractable from the 
numerical point of view unless we keep the number of unknown parameters fairly small 
(say less than 100). Thus we might assume that all individuals were equally healthy. At 
the same time, also for numerical convenience, we may employ the theorem stated in the 
introduction. According to this we may transform the parameter space before maximizing 
(7), and then, after estimates have been found in this transformed space, we can transform 
them back into the original space. We shall adopt this technique in the illustrations on Alpine 
Swifts by employing (4) and (5) to carry p-space into p-space. Another technique worth 
considering is the deliberate use of an excess number of parameters, by the introduction of 
Biometrika 40 18 
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new parameters which depend in a functionally determinate manner upon the minimal set 
of parameters assumed in formulating the model. In the augmented parameter space the 
minimal set will be some hypersurface. We may then maximize (7) upon this hypersurface 
by the use of Lagrangian undetermined multipliers. This will prove a convenient procedure 
when it is easier to work with many simple equations than with a smaller number of com- 
plicated ones. 

The parameters which are sought are the ¢’s and the p’s. The former specify death-rate, 
and the latter lead immediately to estimates of population size. 


AN EXPERIMENT WITH ALPINE SWIFTS 


We shall analyse some data collected upon two colonies of Alpine Swifts (Apus melba) at 
Solothurn in Switzerland from 1920 to 1950. 

A bird aged x years is called a nestling if x< 1, young if 1<x< 2, and adult ifx>2. An 
ornithologist can tell the difference between these three categories upon inspection. Alpine 
Swifts migrate annually. If a bird returns to the place of catching when it is adult, it is 
ornithologically reasonable to suppose that it will thereafter continue to return to this same 
place each year until it dies. This does not necessarily hold for nestlings or young. When 
a bird is caught for the first time a ring is placed upon its leg. All rings bear distinct numbers. 
Only a small number of birds are caught when young; and it is convenient, in view of the 
anomalous migration of young, to suppress all records of capturing young. A bird is said to 
be ‘ringed’ when it is caught for the first time in the unsuppressed records. We shall suppose 
that an adult bird has a constant probability ¢ = 1—y of surviving one year. This is a much 
simpler assumption than that of the previous general treatment. It is not an unfair supposi- 
tion in view of the fact that most birds die due to natural hazards rather than through old 
age. These hazards vary from year to year, for meteorological reasons amongst others; but 
the estimate, which is wanted, is naturally a hypothetical one referring to an ‘average’ bird 
in an ‘average’ year. Both these last two uses of the word ‘average’ are defined by the 
assumption that ¢ is to be regarded as a constant; that is to say, the death-rate for Alpine 
Swifts is the metric upon which an average is taken rather than, say, the mean winter 
temperatures or the supply of summer food. Since ¢ no longer depends upon a suffix j we 
may write ,; for the probability that an adult bird (known to be alive in year 7) is subse- 
quently recaught. To deal with the anomalous behaviour of nestlings and young, we write 
nm for the probability that a nestling (ringed at birth) survives and returns to the place of 
catching two years later. Define w = 7/¢. 

We can now write down the probabilities for the various possible records on a single bird: 

[1] Bird ringed in the unsuppressed records as an adult in year i and not recaught: 


L; = (1—p;). (8) 
[2] Bird ringed in the unsuppressed records as an adult in year i and recaught in years 
a, 8,...,6 and in no other years: 


Ly = $~*p,pg ... pp Meta 1 (1 _ (9) 
qa Up oe) 
[3] Bird ringed as a nestling in year i — 1 and not recaught in the unsuppressed records: 
Ly = (L— 7) +9 541(1 — Mess) = 1-— w+ Po(1 — p45 — Gear Mess) 


= 1-0, (10) 
by virtue of (2). 
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[4] Bird ringed as a nestling in year i—1 and in the unsuppressed records recaught in 
years a, 8,...,9 and in no other years: 


L, = expression (9) multiplied by w. (11) 


[5] Bird found dead in year 6: in expression (9) we must replace $(1—,) by y. Hence it 
is enough to introduce the factor 
Se (12) 
P(1 — fe)” 
[6] Bird killed artificially for experimental purposes in year @: introduce the factor 


1 


. 13 
= (13) 





To simplify the ensuing numerical analysis we now transform from the parameter space 
(¢,7, p) to the space (¢,w, uw). This is justified by the theorem quoted in the introduction 
and effected by substituting (4) and (5) into (9) to yield 














PO DePp-- Bo wat (1 He) 
-(S2))- (ees Se 
= 6-00 (Fe) (Ee) ~ (EGR) ste) Cpa). 


Upon noting that y,, = 0 and collecting up expressions (8)—(14) inclusive, we find for the 
likelihood of the whole set of data D, 


1-¢ Fm-1 ; 
me w(S4) TT {(1 — op ,)% (1 — 45) (B — 44)?# (M; — Soins)”, (15) 


i=1 
where in the unsuppressed records 


A = number of nestlings which are recaptured; 

B, = number of nestlings ringed in year i— 1 which are never recaptured; 

C,; = (number of adults ringed in year « and never recaptured) 

— (number of adults ringed before year i and recaptured after year ¢) 
— (number of nestlings ringed before year i — 1 and recaptured after year 7) 
— (number of birds killed artificially or recaptured dead in year 1); 

D, = number of birds which, having been ringed either as adults before year 1+ 1 or as 
young before year i, were recaptured sometime after year i + 1 but which were not 
recaptured in year i+ 1; 

E,; = number of birds recaptured in year 7 + 1; 3 

F = number of birds found dead but not artificially killed. 


It it thus seen that, under suitable assumptions; it is not necessary to know individual 
histories. 


18-2 
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Let us renumber the suffices i so that they run over the values ¢ = 20, 21, ...,50 corre- 


sponding to the years 1920-50 inclusive. The data then yields A = 205, F = 6 and 


a B; C; D; E; ) 
20 0 + 1 2 0 
21 17 0 2 0 
22 18 —- 2 6 1 
23 0 — 65 4 4 
24 12 0 ay 7 
25 33 + 2 3 7 
26 25 -— 65 11 0 
27 12 0 11 0 
28 35 -— ll 17 0 
29 25 -— 17 17 7 
30 0 — 21 15 7 
31 27 -— 17 10 18 
32 24 — 23 18 18 
33 42 — 29 28 25 
34 60 — 37 31 28 q 
35 82 — 61 34 42 
36 100 — 51 26 58 
37 129 — 15 44 16 
38 103 — 4l 51 9 
39 82 -— 61 57 3 
40 129 — 58 47 17 
41 178 — 57 41 21 
42 125 — 53 36 37 
43 175 — 44 60 23 
44 113 — 65 51 33 
45 132 — 66 60 43 
46 155 — 90 61 61 
47 189 — 96 59 81 
48 216 — 103 4Z 87 
49 3 — 58 0 84 
50 328 + 18 0 0 





The summary (16) shows that the terms in L involving /9) are 


(1 — a9) (P — H29)?. 
To maximize L we must therefore minimize 95. According to (3) we get 


Hoo = $fa1- 
Now substitute (17) into (15) and pick out the terms in wv, 


(1 — Wpgy)*” (P — q3)* (1 — Sigs) (P — Pt). 
Once again use (3) to obtain from (17) 


Hao = Pia, = P*Ha9- 


(16) 


(17) 


(18) 
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We can now eliminate #2) and #2, from (15). The terms in 44. become 


P4(1 — Peptgg)*” (1 — wplgn)?® (f — fla2)® (1 — Dftga)® (1 — P2H22) (Hae — Pita) (19) 

The analysis may proceed in a straightforward manner until we reach the next run of zeros 
in H;. Treating these in a similar fashion we find 

Hoe = Pilar = Pog = Pap. ; (20) 


Upon elimination of jog, “a7 and fs, from (15), the stage is set for logarithmic differentiation, 
and the estimation follows from the simultaneous solution, subject to (3), of 















































= + + +——+ i = 23, 24, 30,31, ..., 48, 49), 21 
Mi-Ghin, 1-on;, 1-wy, O-M Min-— OK ( ai 
l 18 17¢ 6 2¢6 e 
——---= —+ oo + : 22 
Hog—PH23 1—Oftes “T- Pf ., P—-Hog 1—Pigg 1—P%Moe ia 
7 33 3 2 7¢ 
Hos—P*eg = 1—Ofles P—Me5 1—fMes Heg—Pfl25 (23) 
ae _ 38¢o “ 12¢?w rm 25¢% 
Hag— P39 1—Wplag 1— Popa, 1—P wily 1—P*wpsy 
17 11g? 53 7¢4 
+ + - + , 24 
P—Meg 1—P%zg 1—P¥tog Ho5— P*Hog ied 
Mso = 9, (25) 
9 = 9a EE _ 205, — Bios 
ew wo F l—wp; 
_ 17a, 18Han Bilas 
1—Qwflgg 1—Wflg, 1—Wfgs 
_ _25P% og 12P "ag 35D tng 25 tag (26) 
1—P¥apleg 1—P?wpeg 1—Pwpag 1—wplgy’ 
élog L D; E yp; 37 6 
0= 90= -2'| _ ee iit \+o-sS, 
op i\P-h& MiP) @ 1-¢ 
_ 170 p22 a 
1—Gwpe, P—Me, 1—Pileg 1—P*Uge 
+ 328 Pog es — 8 W'ag 
P—Hes Mes—P*Heg Ho2—Plles 1—Pwpog 
24DWfio, T5P*Wiloy “ 17 
1—G?0pleg 1—P*Wpt99 $—Hep 
Hs 222g , 15P%Hog ———T40 ; (27) 
1—P79g 1—P¥ Hog Mog — Pls0 


In equations (26) and (27), >’ denotes summation over 7 = 23, 24, 30,31,...,48,49. The 
i 


method of solving the foregoing set of 28 simultaneous equations will be discussed in the 
next section. 

The next stage is the determination of the inverse variance-covariance matrix for 
($,w,). As already mentioned in the introduction, it is not clear what one should do with 
co-ordinates on the frontier of the parameter space. In default of a better procedure, I have 
regarded (18) and (20) as algebraic identities; and thus taken 


Por = Poe = Por = Pos = Pog = 9 
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as quantities not subject to sampling error. It then remains to calculate 
0 (10L et 
C3 = au, ( =) (2,9 = 22, 23, 24, 25, 29, 30, ..., 48, 49); 


@ (lab @(1aL\. : 
C8 = By, (33) 1 Cw = ar (750) (i = 22, 23, 24, 25, 29, 30, ..., 48, 49); 


ae Za) ‘ - atx 3 ee 50) 
$e 0f\Low)’? %  ab\Lad)’ w = 35 (Lae ‘ 
The algebraic expressions for these quantities are lengthy but quite easily obtained from L, 
and therefore will not be printed here. The only non-vanishing c’s are ¢,,, C44> Cow? 
Cup = gus Citta = Cit1,i2 Siw = Cuts Cig = Cgi- 
If we write 


ie oy BR a: Uj = 28,23, 26, 25, 20,90, «:, 40); 
CH gw By 

then -C~! is the variance-covariance matrix of the estimates in the parameter space 

(up, @, 6). To transform to the (p, w, 6) space we take a first variation of (4) and obtain 





(P—i-1) Ai-1 
6q; = —-=> Ou; 6g, 
, 4 =~ Fam) Mt ga — pet ga — a)? 
with the exception 
1 P?(P — Hos) Has 3p fl29(P — M25) 
Sa = ———_.— i ots se ee t+ SE) Od. 
- P(1 — P7109) Has + (1 — D¥H129) M0 + P?(1 — P29) ‘4 (1 — P¥ 29)? ? 


Consider now the 26 x 27 matrix U whose rows and columns are indexed according to the 
following scheme: 














22 23 24 2 29 30--—---48 49: w ¢) 
23 x x i . 
24 els Se ie 
25 x x é 
26 x x S 
\ 28 
31 x \ on (28) 
; \ ‘ 
i x XN 
48 , x it 
\ 
49 x xX i 
nad x a. Oe 
¢ x xX 
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and whose elements vanish except in the last column and on the leading diagonal and super- 
diazonal. These remaining elements are specified by 


Vir, = —1/P(1—pw,) (¢ = 22, 23, 24, 29, 30, ..., 47, 48); 
Ug; = = (P—H:-1)/G(1—4,)* 
ug = Mi-1/P?(1 — 4) 
Ugg,05 = — 1/9(1— P29); 
Ugg,29 = D°(P — Mas)/(1 — P2Ho9)?; 


| (i = 23, 24, 25, 30, 31, ..., 48, 49); 





- sf Hes 3h fla9($ — Hos) 
mt P= Flag) © (1= Fag)® 
Un = Ugg =1; Usag = Uog = Uyy = O. 


Then the variance-covariance matrix -for estimates in the (p,w,¢) space is —UC-!U’, 
whose rows and columns are indexed in the same fashion as the rows of U in (28). 
The estimates of p themselves follow from (4). 


NUMERICAL ANALYSIS 


The main numerical labour lies in the solution of the 28 simultaneous equations (21)—(27) 
inclusive. This was done by iteration as follows: 

(i) Choose trial values of ¢, w and “99. Then use equations (21)—(24) inclusive to calculate 
in succession /93, M94; /25; Hag; 30) «++» Hag» 459, this iast value of 4; coming from (21) with 
i = 49. This sequence of calculations will be called a minor cycle. The final value of 59 so 
obtained should be zero to agree with (25); but in fact it will not be, due to the inexact trial 
choice of 9. So keeping ¢, w both fixed, adjust the trial value of ~,. and begin a new minor 
cycle. Repeat the minor cycles with successive adjustments of /.. until the resulting value 
Of (59 satisfies (25). 

Then 

(ii) use equations (26) and (27) to calculate Q and ®. The work so far covered in (i) and 
(ii) will be called a major cycle. 

(iii) Do +wo more major cycles with fresh trial values of ¢, w. We now have three pairs 
of values of 0, ®. Carry out a bivariate inverse interpolation in the (Q, ®)-plane to estimate 
values of ¢, w such that Q = ® = 0. 

(iv) Finally, repeat the major cycles, with successive approximations upon ¢, w until 
(26) and (27) are satisfied. 

In carrying out the minor cycles we have to observe (3). Thus any minor cycle can be 
stopped, when only partly completed, as soon as (3) is violated by a poor choice of go. 
This helps to shorten the work considerably. Increasing /1.. increases all the other y’s for 
fixed ¢, w; and decreasing /49, decreases all the other ’s. The latter values of u are extremely 
sensitive to small changes in //99; in the final solution a change of 10-!° in 7. gave a change of 
4x 10-5 in “59. Similarly, in the major cycles 2 and ® were very sensitive to changes in Us». 
On that account the work was carried out retaining 10 significant figures throughout, with 
a number of interpolating refinements built into the final stages. After the final major cycle, 
the computing errors in ¢ and w were determined variationally and found to be 4 x 10-* 
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and 1 x 10-* respectively. This leaves sufficient margin in view of the standard errors of 
these quantities. 

All the foregoing work was done on ordinary desk calculators. A minor cycle could be 
completed in about a day; and a major cycle in about 10 days. 

The 27 x 27 matrix C was evaluated on a desk calculator. C was inverted on SEAC. This 
required a couple of days for coding, followed by about 4hr. machine time. In view of the 
large proportion of zeros in C’, however, this inversion could probably have been accom- 
plished with about two months’ work on a desk calculator. Finally, the standard errors were 
calculated from the diagonal elements of — UC-!U' on a desk calculator. 


NUMERICAL RESULTS 


The final major cycle employed the values 


@ = 0°82177 252, w = 0-15231 687, 








2 7 

22 0-49114 70078 
23 0-55283 78505 
24 0-48551 39880 
25 0-39790 15014 
29 0-57753 15026 
30 0-62797 49077 
31 0-68978 94273 
32 0-65635 69119 
33 0-64690 45524 
34 0-63783 00155 
35 0-63643 78806 
36 0-57590 81349 
37 037055 55369 
38 0-37884 51461 
39 0-42295 19018 
40 0-50268 84610 
41 054322 89445 
42 0-57591 99689 
43 0-54504 16713 
44 059087 04335 
45 0-61829 34809 
46 0-64147 85385 
47 0-63248 97872 
48 0-57021 21461 
49 0-40731 92325 
50 0-00000 00000 














these values being interpolated between the independent minor cycles 


Hag = 0°49114 70079 leading to j1s9 = + 0-00002 70661, 
Hag = 0°49114 70078 leading to 155 = — 0-00001 02407. 


For a reason to be noticed presently, we classify all birds as either X-birds or Y-birds. A bird 
is an X-bird if and only if it is a nestling when ringed, and is a Y-bird if and only if it is an 
adult when ringed. Since ringing, by definition, refers only to the unsuppressed records, 
these two classes are mutually exclusive and exhaustive. The following table summarizes 
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for each year the number of birds captured, together with estimates of the experimenter’s 
catching efficiency and of the current population size: 
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Number of birds caught 
Wows Catching efficiency Population size 
(p,;+ standard error) | (+standard error) 
X -birds Y-birds All birds 
1920 0 3 3 _ — 
1921 17 2 19 0 _— 
1922 23 0 23 0 _— 
1923 2 2 4 0-100 + 0-096 40+ 38 
1924 19 11 30 0-364 + 0-154 82+ 35 
1925 36 16 52 0-320 + 0-105 162 + 53 
1926 32 6 38 0-241 + 0-084 158 + 55 
1927 12 ll 23 0 _ 
1928 37 + 41 0 _ 
1929 33 0 33 0 _ 
1930 7 3 10 0-201 + 0-071 50+ 18 
1931 38 7 45 0-240 + 0-082 188 + 64 
1932 38 17 55 0-533 + 0-094 103 + 18 
1933 64 20 84 0-430 + 0-080 195 + 36 
1934 82 25 107 0-412 + 0-065 260 + 41 
1935 113 22 135 0-384 + 0-060 351 + 54 
1936 141 35 176 0-468 + 0-056 376 + 45 
1937 170 64 234 0-525 + 0-052 446 + 45 
1938 121 17 138 0-116 + 0-028 1190 + 285 
1939 91 9 100 0-066 + 0-022 1515 + 494 
1940 136 2 138 0-024 + 0-014 5726 + 3279 
1941 202 8 210 0-150 + 0-035 1401 + 326 
1942 143 25 168 0-201 + 0-041 837 + 173 
1943 213 42 255 0-342 + 0-050 745 + 109 
1944 139 16 155 0-177 + 0-035 876+ 172 
1945 172 30 202 0-264 + 0-042 765+ 121 
1946 191 39 230 0-309 + 0-041 743 + 100 
1947 236 60 296 0-403 + 0-043 734 +78 
1948 271 58 329 0-464 + 0-041 709.+ 63 
1949 47 72 119 0-484 + 0-043 246 + 22 
1950 285 55 340 — — 














The expectation of life of a bird is 1/(1—¢) years, and this, together with its standard 
error, is 5-61 + 0-30 years. 

The probability that a bird will survive 2 years from birth and then return to its place of 
birth is gw, and this, together with its standard error, is 0-1252 + 0-0088. Thus about one 
nestling in eight returns two years after birth. 


CoMMENT ON THE DATA AND THEIR NUMERICAL RESULTS 


The data were collected by H. Arn of Solothurn, who was primarily interested in quite 
different problems to those for which his data have now been used. In particular, because 
of his other interests, he made a deliberate attempt in some years to catch unringed adults 
and in other years to ring as many nestlings as possible. As a result the catching is not 
random, the various birds at risk have not always had equal chances of capture, and the 
fundamental assumption of the present capture-recapture method is not satisfied. The table 
above shows evidence of thislack of random catching. Forexample, in 1949the experimenter 
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appears to have concentrated on capturing adult birds; and in this year he has secured an 
unusually large number of Y-birds in a fairly small total catch. As a result in 1949 his 
catching efficiency (based upon the false assumption that he had been catching at random) 
is artificially inflated, and therefore the estimate of population size is made much too low. 
The reverse effect takes place in the years 1939, 1940 and 1941 when he has evidently con- 
centrated on ringing nestlings. 

It was nevertheless decided to analyse the data as though they had been based upon 
random catching, because considerable ornithological interest attaches to the age of 
Alpine Swifts in that they enjoy an unusually long expectation of life in the wild state, 
and because the present data is quite unique in its extensiveness. Since the assumption of 
random catching is not here valid, the numerical results must be interpreted with a good 
deal of reserve. The individual year-by-year population sizes carry very little weight, 
although the general tendency of the size of the population to increase with time is borne 
out by other evidence not presented here. The probability of 1/8 for the return of a nestling 
two years later is also distinctly suspect. On the other hand, the estimate of expectation of 
life is in good accord with estimates obtained from other less extensive experiments, and is 
less liable to be incorrect although its standard error may well be wrong. 

The fact that the numerical results are for these reasons suspect will concern the ornitho- 
logist: but the main object of the present paper is to provide a theoretical analysis of the 
general capture-recapture problem, and to show that the analysis can in fact be handled 
numerically. This has been demonstrated. 


I am indebted to Herr H. Arn of Solothurn, who collected the original data upon Alpine 
Swifts, and to Dr D. Lack of the Edward Grey Institute of Field Ornithology, to whom the 
data were communicated and by whom they were transmitted to me. I am also indebted to 
the computing staff of the Lectureship in the Design and Analysis of Scientific Experiment, 
University of Oxford, for carrying out much of the numerical analysis. In particular, I am 
glad to thank Mrs M. Parke who carried the main burden of the work. I also wish to thank 
Dr J. Curtiss, Mr J. Todd, and the staff of the Computation Laboratory of the National 
Bureau of Standards, Washington, D.C., for the use of SEAC to invert the 27 x 27 reciprocal 
variance-covariance matrix. I should like to record helpful discussions on the subject of 
capture-recapture analysis with Dr D. J. Finney, Prof. R. A. Fisher, Mr P. H. Leslie, and 
Prof. P. A. P. Moran. Finally, I am indebted to the referee for useful comments. 
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THE USE OF CHAIN-BINOMIALS WITH A VARIABLE 
CHANCE OF INFECTION FOR THE ANALYSIS 
OF INTRA-HOUSEHOLD EPIDEMICS 


By NORMAN T. J. BAILEY 
Nuffield Lodge, Regent’s Park, London 


1. INTRODUCTION 


An important characteristic of highly infectious diseases is that they tend to produce groups 
of cases occurring within households rather than separate cases scattered throughout the 
community. If all the cases in a household were due to some single source of infection such 
as typhoid-infected water, then we should expect the total number of cases to follow a 
binomial distribution. On the other hand, if the disease were introduced into the household 
by one of its members and then transmitted from person to person, a very different situation 
would arise. For a disease involving a short period of high infectivity and an approximately 
constant incubation period, we should expect to be able to distinguish different generations 
of the intra-household epidemic. There might be a binomial distribution of secondary cases 
resulting from contact with the primary case, to be followed later by another binomial 
distribution of tertiary cases amongst the susceptibles who had previously escaped, and 
so on. In his classic paper on this subject Greenwood (1931) introduced such a chain- 
binomial model for the investigation of measles epidemics, for which the assumptions made 
are thought to be approximately true. When the period of infectivity is more extended, as 
with scarlet fever or whooping cough for example, the stochastic model recently discussed 
by Bailey (1953) may be more appropriate. Greenwood examined data on the 1926 measles 
epidemic in St Pancras, and showed that, so far as the total number of cases in a household 
was concerned, the hypothesis of a simple binomial distribution was quite inadequate, 
while the distribution expected from the chain-binomial model gave a satisfactory fit to 
the observations. It is important to notice, however, that this material provided information 
only on the total size of the epidemic in a given household; no analysis of the individual 
links of the chain was possible. Now Wilson, Bennett, Allen & Worcester (1939), in their 
investigation of cases of measles occurring in Providence, Rhode Island, during 1929-34, 
were able to go a stage further and break down the data into its constituent parts. Of course 
it must be recognized that departures from a constant incubation period, or the possibility 
of multiple primary cases, give rise to certain difficulties in the so-called chaining of this 
kind of material. However, it seemed likely that in the present case this source of confusion 
would not be serious. Wilson et al. were able to show that in none of the groups of available 
data did the Greenwood model give an adequate description of the numbers of cases 
occurring in the separate generations through which the epidemic passed, though it did 
sometimes give a satisfactory fit to the distribution of the total number of cases in a house- 
hold. In a later paper, Wilson (1947) considered a somewhat different approach. According 
to this, in families containing two susceptibles in addition to the first case, for example, 
one could estimate separately the chance of one susceptible being attacked and the chance 
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that the other would be attacked given that the first had already become infected. This 
principle can be extended to families of any size, though it has the appreciable disadvantage 
that itisnot clearly related to the epidemiological process known to be taking place. A further 
discussion of these problems was undertaken by Greenwood (1949). He made the important 
observation that the failure of the binomial distribution to represent the distribution of 
cases at stages in the chain where a goodness-of-fit is possible could be due to variations 
between households in the average chance of infection. Such a state of affairs might well 
be expected to result from natural variations in both heredity and environment. This remark 
of Greenwood’s is taken as the starting point for the present paper, in which it is shown that 
a satisfactory mathematical model can be obtained, at least for the small amount of suitable 
data available, by retaining the hypothesis of a chain of binomial distributions, together 
with the additional assumption that the average chance of infection varies between different 
households according to a suitable £-distribution. This involves fitting two parameters 
instead of only one as previously. Another important modification of detail in Greenwood’s 
original scheme has lately been suggested by Lidwell & Sommerville (1951). Greenwood 
(1931) assumed that the chance of any susceptible being infected was independent of the 
number of infectious persons to whom he is exposed. Lidwell & Sommerville, however, 
considered the alternative assumption that the risk of infection for any susceptible is the 
same with respect to each infectious individual. This of course affects only families with 
more than two susceptibles in addition to the primary case. In their analysis of the dis- 
tributions of the total number of cases of the common cold in households, Lidwell & Sommer- 
ville showed that on the whole the Greenwood model was inadequate whereas their own 
modified scheme fitted quite well. For a useful discussion of the epidemiological aspects 
of the problem of estimating infectiousness Hope Simpson’s (1952) recent paper may be 
consulted. 

Households with only two individuals, i.e. containing only one susceptible in addition 
to the primary case, permit one to estimate the average chance of cross-infection, but are 
too small to exhibit a chain of cases. We shall therefore not consider these any further, but 
shall proceed to a discussion of households of three and four persons, for which there is 
observational material available for testing. The treatment of larger households is more 
complicated though similar in principle. 


2. ANALYSIS OF HOUSEHOLDS WITH THREE INDIVIDUALS 


We will now consider households of three individuals, containing two susceptibles besides 
the primary case. Let us suppose that the probability of a susceptible contracting the 
disease on exposure to an infectious case is p = 1—q. Then the first generation of secondary 
cases, following the primary case, will take the values 0, 1 or 2, with frequencies q?, 2pqg 
and p*. For the values 0 and 2 the epidemic within the household comes to an end; in the 
first case there is no further source of infection, and in the second there are no more suscep- 
tibles. On the other hand, a single secondary case may be followed in the second generation 
by 0 or 1 tertiary case, with frequencies g and p. The expected and observed numbers of 
households falling into the four classes are shown in Table 1, which also includes some 
suitable material taken from the Providence measles experience already referred to 
(Wilson et al. 1939), together with an obvious notation for the different kinds of chain. 
Wilson et al. analysed their material in several different ways, both including and excluding 
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children under 7 months or over 10 years old. Moreover, some of their tables are really 
relevant to families containing more than three children. We have decided for the present 
purpose to confine ourselves to the data on households of three, containing just two further 
susceptibles, apart from the primary case, at risk between the ages of 7 months and 10 
years. This seemed to be the most reasonable way of selecting what might be expected to 


Table 1. Greenwood chain for households of three 








Expected Observed Providence Fitted 

Type of chain no. of no. of measles data al 
households households (see text) — 
1>0 ng a 34 14-9 
1>1+>0 2npq? 6 25 23-5 
1>+1>1-0 2np*q c 36 87-7 
1+2->0 mp* d 239 207-9 
Total n n 334 334-0 























be a fairly homogeneous group of data. In analysing their data Wilson et al. estimated the 
parameter p by equating the expected size of the total epidemic to the observed value. This 
is not necessarily very efficient. They then examined the fit to the four classes shown in 
Table 1. A more satisfactory procedure is to estimate p by maximum likelihood using all 
four classes shown in the table. It is easy to see that this efficient estimate is 


D = (b+ 2c + 2d)/(2a + 3b + 3c + 2d), (1) 
with large sample variance var p = pq/2n(1+ pq). (2) 


Applying formula (1) to the data shown in Table 1, we find p = 0-789. The corresponding 
expectations appear in the right-hand column. The goodness-of-fit y? is 59-8, which for 
2 degrees of freedom gives P < 0-001. Thus the Greenwood model, which for groups of three 
is unaffected by the modification suggested by Lidwell & Sommerville, is quite inappropriate 
as it stands. Comparing the actual observations with their expectations in Table 1 shows 
that there are more families than expected, in which the epidemic either does not go beyond 
the primary case or else involves the remaining two susceptibles immediately. This fits 
in with Greenwood’s (1949) suggestion that the chance of infection may vary from family 
to family. Suppose we now assume that this source of variation in p between households 
is represented by the f-distribution 


ees e 
aF = 5 yh (l—Ptdp, (0<p<)). (3) 


All we have to do is to average the expectations for each kind of chain over all households 
and exhibit them as functions of x and y. As the individual frequencies in Greenwood or 
Lidwell chains can always be represented in terms of quantities like p’q°, it is convenient 
to work out the expectation of this quantity. We have 


1 
Epry = | petgre-tdp/Bie,y) 


= Biz+r,y+8)/B(z,y). 
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Therefore 
Ep’? = x(x+1)...(e+r—1)y(y+1)...(y+s—1)/(ex+y)(et+y4+1)...(7+y+rt+s—1). (4) 
In particular 

Ep’ = x(44+1)...(u+r—1)/(x+y)(e+y41)...(u+y+r—1), (5) 
and E@ = y(y+ 1)... (y+s—1)/(x+y) (ez +y41)...(e+y+s—1). (6) 


If we apply these results to the simple expectations shown in Table 1 we obtain the derived 
set of values set out in Table 2, which we shall proceed to analyse by the familiar technique 
of maximum-likelihood scoring. The maximum-likelihood scores for x and y are clearly 


sg, = (e824) _( n mn n mi “), 














x x+1 zt+y xt+yt+l xet+yt+ (7) 
(ee ri) ( n n b+c 
and S,= +——}) —-{|——-+ ~ 
’ y yt+l1 x+y xt+yt+1l x+y+2 


Table 2. Modified chain for household of three with variable p 
































Cieneuil Providence , 
Type of chain Expected no. of households no. of = os 
households (see text) 
1+0 ny(y + 1) (x+y) (z@+y+1) a 34 34-9 
1+150 Qnaxy(y + 1)/(x+y)(e@+y+1)(x+y+2) b 25 22-7 
1>1>1->0 Qnyx(x+1)/(x+y) (x +y+1)(x+y+2) c 36 37°6 
1+2>0 na(x +1)/(e+y)(a@+y+1) d 239 238-8 
Total n n 334 334-0 
| 
The expected amounts of information are 
ee n(x + 2y+1) na(x + 3y-+ 2) ou 
2 a(at+y)(etytl) (@+1)(et+y)(e@+y+1)(@tyt+2) ~’ 
Ley = -—J, 
n( 2a +y +1) ny(3xa+y +2) > (8) 





Ly = ety) @ry+l)* y+)@ry)@+y+l@ryty 
n n 2naxy 


where J = Gry @ryti! Gen @tyF)@tyt2 








In many cases it will be easier to use the observed amounts of information, which can be 
derived from the three different expressions in brackets making up the two scores in (7) 
simply by replacing the denominators by their squares. 

An easy way to obtain two rough estimates of x and y to commence the iterative process 
is to amalgamate the second and third classes in Table 2, and then equate the observations 
to their expectations. This gives 


& = (b+c¢) (b+¢+ 2d)/{4ad —(b +c)%}, 


9 
and ¥ = (b+c) (2a+b+c)/{4ad —(b +c)?}. " 
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If we now use these results on the Providence measles data, we obtain first the initial 
estimates from equation (9), giving 


&€=1-142 and # = 0-273, 
for which values the scores are S, = — 1-1253 and S, = + 2-1255. The information and 
covariance matrices are respectively 
63-8027 — 226-696 a 6-2401 +1-3151 
is i ee iors aia posal id smeahies: E 1-3151 reset fe (10) 


The adjustments to be made to # and ¥ are therefore — 0-042 and — 0-007 giving 1-100 and 
0-266, respectively. A further stage of iteration makes a slight change to 


#, = 1-092+0-250 and 9, = 0-264+0-061, (11) 


where we have followed the estimates by their standard errors derived from the covariance 
matrix, and the subscripts refer to the size of the household. The expected values are given 
in the right-hand column of Table 2. The fit to the data is now very good indeed, the x? with 
one degree of freedom being about 0-32. It is interesting to observe that since y is less than 
unity-the f-distribution is J-shaped with an infinite ordinate at p = 1. This is in keeping 
with the excess of families observed in Table 1, in which both susceptibles succumbed 
immediately to infection by the primary case. 


3. ANALYSIS OF HOUSEHOLDS WiTH FOUR INDIVIDUALS 


When we come on to households with a total of four susceptibles, i.e. three in addition to 
the primary case, then the modification introduced by Lidwell & Sommerville begins to 
have an effect. It influences the tertiary distribution, in which, when there are exactly two 
secondary cases, the values 0 and 1 will occur with probabilities g? and 1—4g? instead of 
qand p. Let us again use the Providence measles data, confining ourselves to households 
of four, containing just three further susceptibles, apart from the primary cases, at risk 
between the ages of 7 months and 10 years. The appropriate expectations and observations 
are all collected together in Table 3. The maximum likelihood estimate for the Greenwood 
model is plainly 


PD = (b+ 2c+ 2d + 3e + 3f + 3g + 3h)/(3a + 5b + 6c + 4d + Be + 5f + 4g + 3h). (12) 


Table 3. Greenwood and Lidwell chains for households of four 














Expected no. of households | Observed — Fitted values 
no. of R 
Type of chain house- — 
Greenwood Lidwell holds Greenwood| Lidwell 

1+0 ng ng a 0-9 1-2 
1>+1>0 3npq* 3npq* b 3 0-4 0-7 
1>+131+0 6np*q* 6np*q* c 1 0-7 1-0 
1+2-0 3np*q? 3np*q* d 8 8-2 2-2 
1>1>1>1+>0 6np*q* 6np*q* e 4 2-7 3-4 
1+1+>230 3np*q? 3np*q? Xf 3 6-5 7-3 
1+2>1->0 3np*q 3np*q(1 — q*) g 10 31-0 38-7 
1>3+>0 np® np ® h 67 49-6 45:5 
Total n n n 100 100-0 100-0 
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Substituting the Providence data gives p = 0-791, and the expected numbers are as shown 
in the table. The goodness-of-fit test, amalgamating the first four classes and then the next 
two, gives P < 0-001. To estimate p on the Lidwell model we have to solve 


b+2c+2d+3e+3f+29+3h 344+4b+4c+3d+3e+2f+g 2g 
S, = ee + 5= 9. (13) 

P q 1-q 
Taking a few trial values and then interpolating inversely soon gives the required root. 
We find, for the data of Table 3, that » = 0-769. Again a very poor fit is obtained with 


P<0-001. 
Table 4. Modified Greenwood chain for households of four with variable p 














Observed : 
no. of | Prvi- | Fitted 
Type of chain Expected no. of households teehee - dence wéllasss 
holds | ta 
1+>0 ny(y +1) (y+2)/(a+y)...(e+y4+2) a 4 4-9 
1+130 3naxy(y + 1) (y+ 2) (y+3)/(x+y)...(~7+y+4) b 3 2-6 
1>+1>1+>0 6na(x +1) y(y+ 1) (y+ 2) (y+3)/(v+y)...(ex+y+5) c 1 1-9 
1>2->0 3nx(a+1)y(y+1)/(x+y)...(e+y+3) d 8 5-0 
1>1>-1>1-0 6na(x + 1) (x +2) y(y+1)(y+2)/(x+y)...(e+y+5) e 4 2-1 
1>+1+2->0 3na(x+ 1) (x+ 2) y(y+1)/(x+y)... (e+ y+ 4) f 3 3-0 
1+2>1->0 3nx(a +1) (2+ 2)y/(x+y)...(u~+y4+3) g 10 13-3 
1>3>0 na(a+1)(x+2)/(x+y)...(e+y+ 2) h 67 67-1 
Total n N 100 99-9 























We now turn to our previous assumption that p varies between households and has the 
f-distribution given by (3). Applying this modification to the expectations of the Green- 
wood model in a manner similar to that employed for households of size three, we obtain 
the expectations and observations shown in Table 4, which is analogous to Table 2. The 
maximum likelihood scores are 


n—a  n—a—b stig) 
z r+1 r+2 








n n n n—a-h b+c+e+f c+e 
—({—— + + yp «((14) 
x+y xt+yt+l xe+y+2 xe+yt+3 xet+yt+4 x+y+5 
n-h n-g—-h a+b+c+e b+e 
5, = ( y yt y+2 tea) -()> 
where the second member of S, in brackets is the same as the second member of S,. The 
information functions are 
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nm—-h n-g—-h at+b+c+e b+e » (15) 
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Some additional accuracy can no doubt be gained if the observed numbers in the several 
numerators in (15) are replaced by their expectations. There seems to be no obvious way 
of finding simple initial estimates, but with a few trial values it is not difficult to get suffi- 
ciently accurate approximations. Scoring with x = 1-43 and y = 0-31 gives 


S,=+0-4124 and 8S, = —0-2782. 


The information and covariance matrices are 


14-6024 — 55-233 21-925 +3-9861 
=[ | a 


aa —2 
— 55-233 res oF ia tt ooaee 1-0538 


The required adjustments to the estimates are therefore + 0-08 and + 0-01. We may there- 
fore write the maximum-likelihood estimates as 


%,= 1514047 and 9, = 0-:32+0-10. (17) 


Again, the expected values appearing in Table 4 show a very close fit to the observations. 
Grouping the first six classes into three pairs leaves 4 degrees of freedom altogether. The 
resulting goodness-of-fit 7? with 2 degrees of freedom is 2-20, a value which is entirely 
satisfactory. It hardly seems worth while at present to undertake a similarly modified 
treatment of the Lidwell chain model. 

Comparing the results given in (11) with those in (17) shows no significant difference 
between the estimates obtained for households of three and households of four. We may 
therefore combine these estimates. The weighted averages, with weights equal to the 
reciprocals of the variances, are 


#=1-18+0-22 and 9 = 0-279+0-052. (18) 
The average chance of infection, , is given by putting r = 1 in (5). Thus we obtain 
p =2|/(x+y) = 0°81, (19) 


using the values appearing in (18). If in the future it proves possible to group data on the 
household distribution of the cases of a disease according to some such classifications as 
geographical area or social class, etc., it will be interesting to see whether p has a more 
circumscribed distribution within each of these groups. 


4. SUMMARY 


Greenwood (1931) was the first to introduce the chain-binomial model to describe the 
household distribution of cases of a disease with a short period of high infectivity and 
a constant incubation period. Greenwood obtained satisfactory goodness-of-fits to the 
St Pancras measles data, at least so far as the distribution of the total number of cases was 
concerned. Wilson et al. (1939) showed that on their measles data for Providence, Rhode 
Island, even if the fit to the distribution of the total number of cases was satisfactory, the 
theory was quite inadequate when the distributions of the separate generations of the disease 
within a household were examined. Later, Greenwood (1949) suggested that this might be 
due to variation in the chance of infection, p, between households. In the present paper the 
Providence data are re-examined on the assumption that p varies between households 
according to a suitable £-distribution. Two parameters then have to be estimated. p appears 
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to have a J-shaped distribution, with an infinite ordinate at p = 1. Two groups of the 
Providence data for households of three and four individuals were investigated, and in both 
cases excellent fits of the theory to the data were obtained. 
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SPREAD OF DISEASES IN A RECTANGULAR PLANTATION 
WITH VACANCIES 


By G. H. FREEMAN 
East Malling Research Station 


1. INTRODUCTION 


This paper discusses the spread of diseases from plant to plant in a rectangular plantation 
where there may be some missing plants, which will be described as ‘vacancies’. The 
problem is considered in two ways: (i) the degree of agglomeration of diseased plants on 
any one occasion, (ii) the spread of disease over a period of time. In the work which gave 
rise to this investigation the plants were hops, and the spread of the virus disease, nettlehead, 
over a period of five years was examined. 

The methods used for the first aspect of the investigation, which may conveniently be 
called spatial, derive from various writers, principally Krishna Iyer. Historically, the first 
attempt te deal with this problem was that of Todd (1940), although previously Cochran 
(1936) had examined the linear case. Todd considered the distribution of pairs of adjacent 
plants, or points on a lattice, on the assumption that the underlying distribution was the 
binomial. The method of doublets, as he called it, has been followed in most of the subsequent 
work, although Finney (1947) showed empirically that Todd’s formula seriously over- 
estimated the variance, because in an agglomeration doublets do not occur independently 
of one another. Van der Plank (1946) applied Cochran’s and Todd’s methods to virological 
studies, but without knowing of Finney’s criticisms. Furthermore, he only considered 
doublets in one direction. 

Moran (1948) considered the distribution of doublets for a general lattice, not specifically 
rectangular, and also found third and fourth moments in a particular case of the rectangular 
lattice. Krishna Iyer gave general solutions to the problem for a rectangular lattice in two 
papers; diagonal pairs were excluded in the 1950 paper, but considered in that of 1949. 

Moran and Krishna Iyer both distinguished two cases: (i) where the probability of any 
plant being diseased is the same as that of any other plant being diseased and independent 
of the number of diseased plants, and (ii) where the number of diseased plants is known and 
interest attaches only to their distribution over the area. These cases are called free and 
non-free sampling respectively. Following the accepted notation, diseased plants are 
referred to here as black and healthy plants as white, and it is the distributions of both black- 
black (BB) joins and black-white (BW) joins that are of interest in practice. 

Todd also considered sets of three and four neighbouring plants, called respectively 
triplets and quadruplets, but for these Finney (1947) showed that the variance would be 
under-estimated on the binomial assumption. Sukhatme (1949) considered triplets and 
stated that the inclusion of diagonals gave a more efficient test than the exclusion of 
diagonals, i.e. resulted in smaller coefficients of variation. He showed, further, that when 
diagonals were included the coefficients of variation for triplets were larger than those for 
doublets. In consequence of these results no further attention is here paid to triplets. 

None of the above-mentioned writers has considered what happens when there are 
vacancies on the lattice, although Moran’s general formula can be adapted to this case. In 
the present paper the approach to the problem has been to generalize Krishna Iyer’s methods 
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for a rectangular lattice to make them applicable where there are vacancies rather than to 
particularize Moran’s methods. The results must of course be the same, but Krishna Iyer’s 
methods are the more elegant, for calculating both the probabilities of the number of joins 
of a given type and the total number of joins. (For this latter, the method of his 1949 paper 
has been preferred.) Both the cases, including diagonals and excluding diagonals, are con- 
sidered, these being referred to, by analogy with chess, as gueen and rook cases respectively. 
In the particular example given later in the paper the queen case was thought appropriate, 
but formulae are given here for the rook case as well, for completeness. 

The methods used for the second aspect of the investigation, which may be called temporal, 
derive from the work of Barnard (1947) and Pearson (1947). Barnard distinguished three 
possible types of significance test for a 2 x 2 table—of which the second or 2 x 2 comparative 
trial is appropriate here—and Pearson obtained formulae for actually testing the signi- 
ficance of observed results. All that is done here on this aspect is to consider the relevant 
method of analysis and apply it. 


2. SPATIAL ASPECT 


Consider a rectangular lattice m by n with vacancies, where m+n = a,mxn = b. Suppose 
that there are u vacancies in the interior of the lattice, v on the border and w at the corners, 
where u+v+w = U. Points that are not vacancies will be called existent points. In the free 
sampling case let the probabilities of getting black and white points be respectively p and q, 
and in the non-free sampling case let the numbers of black and white points be respectively 
n, and n,. These two colours need not necessarily account for all the existent points in either 
case. Only the non-free sampling case will be considered in detail, this being slightly the 
more complicated and also the more likely to be of practical use. The appropriate substitu- 
tions for free sampling will be indicated. All the probabilities derived below assume the 
points to have been selected at random and their colours then examined. 














In the non-free sampling case, then, the probability of any point being black is s5 
and of any point being white is b a . Thus the probability of any join of two points being 
55 & n(n; —1) or ut(6-U-2)! a) 

(b—U)(b-—U-1) (n,—2)!(6—U)!" 
Similarly the probability of a BW join is 
2n4 Ng 
-U)(-U-1)° (2) 


The corresponding probabilities for free sampling are p? and 2pq. 

Further, two joins can arise in only two ways. Either they can come from three adjacent 
points, in which case they will be described as dependent pairs of joins, or they can come from 
four points in two discrete pairs, when they will be called independent pairs of joins. For 
non-free sampling the probability of obtaining three adjacent black points is 

n,!(b—U —3)! 
(n;—3)!(b-—U)!’ (3) 


and of obtaining four black points in two discrete pairs is 


n!(b—U —4)! 
in, 1-0)!" (4) 
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For BW joins the position is slightly different. If three adjacent points are to make two 
dependent BW pairs the points must be either BW B or WBW. The sum of the probabilities 
of the different kinds of dependent BW pairs is thus 


Ny(M—1)Ng+NyNQ(Mg—1) _ Ny Nq(N, + Ny — 2) (5) 
(6—U)(b—U—1)(b—U-2) (b—U)(b—U—1)(b—U—-2)° 





The probability of two independent BW pairs is 


4n,!n,!(b—U —4)! (6) 
(n,— 2)! (m_—2)!(b—U)!" 





The probabilities corresponding to (3)-(6) for free sampling are respectively p*, p*, 
pq(p+4q), 4p7q?. 

Now Krishna Iyer (1950) has shown that the rth factorial moment about zero for the 
distributions of both BB and BW joins is r! times the sum of the expectations of the different 
ways of obtaining r joins of that type in the lattice. The argument employed in that paper 
(pp. 199-201) can be readily modified to show that the above statement holds for both rook 
and queen cases, even if there are vacancies. 

All that is necessary, therefore, is to calculate the numbers of joins or sets of joins of a 
given type for the particular case under examination, and this is done below. 

(a) Queen case. On the lattice there are, in all, 4 corner points, 2(a—4) border points 
and 6—2a+4 interior points. Thus the total number of joins of existent points on the 


IE, nee 2A = 3.445. 2(a—4)+8(b—2a+4)—2V, 
V being a correction for vacancies, i.e. 
A = 4b~—3a+2-V. (7) 

Now each interior vacancy contributes 8 to V, each border vacancy 5 and each corner 
vacancy 3, except that two neighbouring vacancies reduce V by 1. Thus, if p is the number 
of pairs of vacancies V = 8u+5v+3w-p. (8) 

If w is the expected number of joins of a given type we have, for non-free sampling from 
the appropriate expression (1) or (2): for BB joins 


_ An!(b—U-2)! 
= G2 (6-0)! "7 





and for BW joins 
‘. 2An, Ne 
#= 6-U)(6-U-1)’ 
where A is derived from (7) and (8). For free sampling p? and 2pq are substituted for (1) 
and (2) respectively. 
The numbers of dependent and independent pairs of joins must now be considered. In 
the first case, where there are three adjacent points, two joins can arise in B ways, where 


B = 3.4410. 2(a—4) +28(b—2a+4)-W, 





(10) 


W being a correction for vacancies, i.e. 


B = 4(7b-9a+11)-W. (11) 
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Now each interior vacancy contributes 28 to W directly, each border vacancy 10 and each 
corner vacancy 3. Also each interior existent point with one neighbouring vacancy con- 
tributes a further 7 to W, one with two neighbouring vacancies 7 + 6 (= 13), one with three 
7+6+5, etc. Similarly, border existent points with one neighbouring vacancy contribute 
4 to W, those with two 4+ 3, etc. Finally, corner existent points with one neighbouring 
vacancy contribute 2 to W, etc. Let there be x, interior existent points with i neighbouring 
vacancies, y; border existent points with i neighbouring vacancies and z, corner existent 
points with i neighbouring vacancies. Then 


W = 28u + 10u + 3w + (7x, + 13x + 18x75 + 22a, + 25x, + 2724 + 282, + 28zr5) 
+ (44, + Ty, + 9g + LOy, + 10y;) + (22, + 3z_+ 323). (12) 
A(A-1) 
2 
A(A—1) 
2 
For BB joins we thus have, from (3) and (4), that the second factorial moment is /7j2;, 


Since the total number of pairs of joins is , and pairs can only be dependent or 








independent, the number of independent pairs is —B. 








where bers 2Bn,!(b—-U—3)! | [A(A—1)—2B]n,!(b- U4)! 
Mi = “Cm —3)!(6—U)! (n, — 4)! (6—U)! 
Now if o? is the variance, oO? = fy t+ e—p?. (13) 


2Bn,!(b-U—3)! An!(b—U—2)! 
2 2m 1 2)! 
mene =  -G- Uy HIS 0 














, [A(A-1)- 2B) ny! (6- U4)! Reel 4) 
(n, — 4)!(b—U)! ~ L(m,—2)!(6-U)! 
Similarly, we have for BW joins, by substituting in (13) from (5), (6) and (10), 
~ eo 2Bn,N2(n; +» — 2) 2An, Ny 
= (6— 0) (6—U—1) (6—U—2) ‘ 6—-U)(6—-U—1) 
4[A(A —1)—2B]n,!n,!(b—U —4)! 2An, MN» 8 (15) 
(n, — 2)! (n, — 2)! (6— U)! -l seen]: 


The only change necessary for the free sampling case is the substitution of the appropriate 
probability expressions in the first and second factorial moments. For both free and non- 
free sampling Krishna Iyer has shown that the distributions tend to the normal form as the 
size of the lattice increases, if there are no vacancies. Thus, if the number of vacancies 
remains finite these distributions here will also tend to the normal in the limiting case. This 
will be true for both queen and rook cases. 

(6) Rook case. The number of points on the lattice being the same as before, the total 
number of joins of existent points on the lattice is A’, where 


2A’ = 2.443. 2(a—4)+4(b—2a+4+4)-2V’, 
V’ being a correction for vacancies, i.e. 
A’ = 2b-a-JV', (16) 


Now each interior vacancy contributes 4 to V’, each border vacancy 3 and each corner 
vacancy 2, except that two neighbouring vacancies reduce V’ by 1. (Here, and for the re- 
mainder of the section devoted to the rook case, two points are only regarded as neighbours 
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if they are next to one another on the rank or file.) Thus, if p’ is the number of pairs of 
vacancies, p’ being in general different from p as defined above, we have 


V’ = 4u+3v0+2w—p’. (17) 


The expressions (9) and (10) in the queen case are now replaced by the following values 
for the mean number of joins: for BB joins 








_ A'n!(b—U—2)! 
~ (my —2)!(6-U)!’ (18) 
and for BW joins Ppepliaet i (19) 


(6—U)(6-U-1)’ 


A’ being derived from (16) and (17). For free sampling p? and 2pq are again substituted for 
(1) and (2) respectively. 

For the calculation of the second moment we again need to consider the number of ways 
of getting the two different types of ag: of joins. Now, three adjacent points can here be 
derived in B’ ways, where 


B’ = 1.44+3.2(a—4)+6(b—2a+4)— W’, 
W’ being a correction for vacancies, i.e. 
B’ = 6b-—6a+4-W’. (20) 


Now each interior vacancy contributes 6 to W’ directly, each border vacancy 3 and each 
corner vacancy 2. Also each interior existent point with one neighbouring vacancy con- 
tributes a further 3 to W’, one with two neighbouring vacancies 3 + 2, etc. Similarly, border 
existent points with one neighbouring vacancy contribute 2, and so on, and corner existent 
points with one or two neighbouring vacancies 1. Let there be x; interior existent points 
with i neighbouring vacancies, y; border existent points with i neighbouring vacancies 
and z; corner existent points with i neighbouring vacancies, where x; is not in general the 
same as x,, etc. Thus 





W’ = 6u+3v + 2w + (3a + Sag + 6g + 62g) + (2y; + 3yg + 3y3) + (2 +25). (21) 
Since the total number of pairs of joins is = , the number of independent pairs is, 
by the same reasoning as for the queen case, —— — B’. 


Hence for BB joins we have, by substituting in (13) from (3), (4) and (9), the same expres- 
sion as (14) with A’ written for A and B’ for B. Similarly, the variance of the number of 
BW joins is given by making the necessary modifications in (15). As before, the changes 
required for the free sampling case are merely in the probability expressions- 


3. APPLICATIONS 


The method as outlined aboveis completely general, but its application can best be described 
by means of a specific example, for further details of which see the paper of Legg (1953). 
The apparent spread of hop nettlehead disease was considered over a period of five years, 
to determine whether it could be regarded as being due to a natural transport of inoculum 
from plant to plant. In order to do this it is essential to examine both spatial and temporal 
aspects of the spread. 
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The spatial approach is necessary for the consideration of whether diseased piants occur 
in agglomerations, or whether they can be regarded as occurring randomly in space. The 
hops in this instance were planted at a distance of 6 ft. square, and the method of training 
them was such that each plant touched its neighbours, along the rows, across the rows and 
diagonally. This meant that the queen case suggested itself as being the one to use. The 
result of the analysis was that there was strong evidence, in three out of the five years at 
least, that diseased plants were more often found in agglomerations than would have been 
expected had these plants been randomly distributed. 

As an example of the use of the method Legg’s data for the year 1951 will be considered. 
Fig. 1 shows the distribution of plants showing nettlehead symptoms (black plants), those 
without symptoms (white plants), and missing plants (vacancies). The lattice is 11 x 30 
plants in area, and so a = 41, b = 330. Further, there are 122 plants with symptoms and 
196 without symptoms, and so these are the respective values of n, and 3. 

The distribution of vacancies is such that w = 11, v = 1, w = 0, and there are no pairs of 
neighbouring vacancies, i.e. p = 0. We thus have 


U=11+1=12 and V=(8x11)+(5x 1) = 93. 
This gives b— U = 318, and, from equation (7), 
A = (4x 330) —(3 x 41) +2—93 = 1106. 


We now have to consider the existent points with various numbers of neighbouring 
vacancies, and we find that 7, = 66, x, = 8, x; = 1, y, = 8, while the remaining 2’s and y’s, 
and all the z’s, are zero. Thus, from equation (12), 


W = (28x 11)+(10x 1) +[(7 x 66) + (13 x 8) + (18 x 1)]+(4x 8) = 934, 
and so, from equation (11) 
B = 4{(7 x 330) —(9 x 41) + 11]— 934 = 6874. 


Before the next stage of the computation it is convenient to find A(A —1)—2B, which 
equals (1106 x 1105) — 6874 = 1208382. For BB joins we then have 


_ 1106 x 122 x 121 








mx7:Ct« 
and ga p+ 2% 8874x122 x 121 x 120, 1208382 x 122 x 121 x 120x119 _g 
~— 318 x 317x316 318 x 317 x 316 x 315 PB’ 


which reduces to 530-66, whence a = 23-0. 

Similarly, for BW joins, we find w = 524-7, 0 = 15-8. 

The observed numbers of BB and BW joins are respectively 220 and 408, the former of 
which exceeds expectation by about two and a half times the standard error, and the latter 
of which falls short of expectation by more than seven times the standard error. By thus 
using the standard error as a measure of the discrepancy between theory and observation 
it appears from the combination of these two results that it is very unlikely that the observed 
distribution of diseased plants was in fact due to a random distribution of these plants in 
the field. 

In the case of nettlehead, which is a virus disease probably transmitted by insects, and 
with this particular method of growing hops, the queen case of the spatial aspect must be 
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the one to use. This need not, however, always be the case, and there may be times when it 
is desirable to consider the spread of disease where diagonal joins are excluded. Furthermore, 
in the case of a disease believed to be spread by cultivations, these being carried out in one 
direction only, it may be appropriate to consider a linear lattice. (Since the formula for this 
case can be easily derived from that for a rectangular lattice by taking one of the dimensions 


Fig. 1. Nettlehead status of plants in 1951 
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N = Nettlehead plant. V = Vacancy. . = Plant without symptoms. 


to be of unit width, it is not given explicitly here.) There may, moreover, be cases where 
practically nothing is known about the disease under investigation, and then it may be useful 
to consider a sort of sequential approach, first using a linear lattice, and then, successively, 
the rook and queen cases for the rectangular lattice. If this is done, and one method shows 
evidence of spread but the next does not, the results may well shed some light on the method 
of spread of the disease in question, and thus provide clues to the agency of spread. 
Investigation of the spatial aspect of the matter is not in general enough by itself, and the 
temporal approach should be used in conjunction with it. The reason for this is that even 
definite evidence from the spatial approach cannot do more than suggest that diseased 
plants occur in clusters. This is not sufficient to prove that there is infectious spread, for there 
may be ecological or other factors which could give rise to the same pattern of diseased 
plants. For example, there could be localized conditions of soil or shelter giving rise 
to patches where the disease develops or where symptoms are expressed. Equally a 
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physiological disorder can occur in patches. If, however, the temporal method shows that 
the disease is spreading throughout the plantation under investigation, while retaining the 
characteristic pattern of agglomerations, there would be strong evidence of genuinely 
infectious spread. 

The investigation of nettlehead disease provides one practical instance of the use of the 
two approaches, and a second example is given by another disease of hops, the fluctuating 
type of Verticillium wilt. In one particular locality the appearance in one year of agglomera- 
tions of diseased plants suggested a spread of the disease, but these agglomerations were 
found to be due to the influence of flood water in the winter in relation to the contour pattern 
in this hop garden. Here the use of the temporal approach would have immediately shown 
the spread to be apparent rather than real. 

The temporal method is used to examine whether plants next to diseased plants are more 
liable to go down with a disease than those all of whose neighbours are healthy. To revert 
to the nettlehead example, the investigation concerned itself with the virus status in any 
year of plants healthy in the preceding year, in order to see if the disease spread principally 
to those plants whose neighbours were already diseased. Any plant which was missing the 
second year but not diseased the first year was tacitly excluded. 

To investigate this, the method of Barnard’s 2 x 2 comparative trial was used. For, as 
Barnard states, the fundamental subject-matter of a 2 x 2 comparative trial is two popula- 
tions, and here we have the two populations of healthy plants: (i) those having some 
diseased neighbours, and (ii) those having all healthy neighbours. These populations being 
given, records are then taken in the next year to see whether the proportion of diseased 
plants is the same in the two populations. It will be seen that, whereas there may be some 
doubt as to which type of spatial approach to use, the essential supplement using the 
temporal approach must make use of the 2 x 2 comparative trial. It is obvious that the same 
definition of ‘neighbours’ will obtain for the two aspects of the same investigation. 

Fig. 2 shows the distribution of plants showing nettlehead in 1951 and not in 1950 in 
relation to plants that showed nettlehead in 1950. The division of the population described 
above shows the following numbers of plants in each category: 

















Plants in 1951 healthy in 1950 
Neighbours in 1950 
Diseased Healthy Total 
Some diseased 53 88 141 
All healthy 12 90 102 
Total 65 178 243 
| 

















In Barnard’s notation, u being a variate distributed normally with unit variance, we 
then have 








ay = (3% 90) — (12 x 88) | i x 102 x 65 x 178 
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whence u = 4-48 with probability P < 0-001. 
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It thus appears that diseased plants are very much more likely to occur if some of their 
neighbours were previously diseased than if all their neighbours were previously healthy, 
which suggests that there is some form of infectious spread taking place. 

In the nettlehead example, the results of the investigation of the temporal aspect agreed 
very closely with those of the spatial aspect because those years when the spatial approach 


Fig. 2. Nettlehead status of plants in 1951 in relation to that of 1950 
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O = Nettlehead plant in 1950, irrespective of whether it had nettlehead, had been replanted or was 
@ vacancy in 1951. 
N = Nettlehead plant in 1951 but not 1950. 
V = Vacancy in 1951, not nettlehead in 1950. 
. = Plant without symptoms in both 1951 and 1950. 


suggested agglomeration were those in which there was a tendency for more newly diseased 
plants to be next to those diseased the previous year. Also, in this instance, the disease had 
spread so far across the recorded area by the end of the period under investigation as to 
suggest that the liability of plants to infection was not related to their position, and thus 
there was less likelihood of ecological factors accounting for the spread. Further, in general, 
when there were significantly more BB joins than expectation there were significantly fewer 
BW joins, and when one did not differ significantly from expectation neither did the other. 

In most practical cases plants will be either diseased or healthy, and so the black and 
white colours are likely to account for ail the existent points on the lattice. There could, 
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however, be cases, such as those of a complex of viruses, where more colours were possible. 
Also, there could be cases where some of the plants were carriers of the disease without 
showing symptoms themselves. It would then probably be desirable to treat such plants 
as vacancies from the spatial point of view but as diseased neighbours the previous year in 
the temporal approach. 


4. SUMMARY 


This paper considers the spread of disease in a rectangular plantation where there may be 
some missing plants, or vacancies. The problem is considered from two points of view, spatial 
and temporal. In the former, formulae are given which modify the existing methods derived 
by Krishna Iyer to allow for vacancies. In the laxter, the 2 x 2 comparative trial of Barnard 
is used. Both methods are applied to a particular example of the apparent spread of a hop 
virus disease, nettlehead. In this case the results achieved by the two methods supplement 
each other, and, in general, both must be used before it can be concluded if the apparent 
spread corresponds to a spread of inoculum. 


I am much indebted to Dr S. C. Pearce for helpful discussion on the statistical aspects, to 
Mr J. T. Legg for his ready co-operation on the plant pathology side of the study and to 
Dr R. V. Harris for bringing to my notice the example relating to Verticilliwm wilt. 
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TESTS OF SIGNIFICANCE FOR CONCURRENT REGRESSION LINES 


By E. J. WILLIAMS 
Commonwealth Scientific and Industrial Research Organization, Melbourne 


The fitting of a straight regression line to a set of observations on two variates is a straight- 
forward application of the method of least squares. Where there are several sets of such data, 
it is of interest to fit regression lines for each set, and to determine whether or not the slopes 
of the lines differ more than would be expected due to chance variations. Where the slopes 
of a set of lines do not differ significantly, the interpretation of the data is particularly 
straightforward, since the comparison of the sets may be made simply by comparing the 
constant terms of the regression equations. In fact, if these terms also do not differ signi- 
ficantly when a single slope is used for all lines, then the regression may be regarded as 
completely homogeneous, and a single equation used to represent the relationship for all 
the data. 

The case of parallel regression lines is particularly important in many applications; for 
example, the comparison of two materials by means of biological assay can be carried out 
satisfactorily only when the regression of response (suitably measured) on dosage is linear, 
and has the same slope for each material. The relative potency is then measured as the 
distance between the parallel lines, measured in a direction parallel to the dosage axis. 
When the regression lines are not parallel the interpretation is difficult or impossible. 

Another case which appears to be important but has not received so much attention is 
the case in which the regression lines, instead of being parallel, are concurrent. This type of 
effect is likely to occur when different materials produce different rates of response, while 
all give the same response at some fixed level. This level in practice is often not zero, and 
often not exactly known. When three or more regression lines are concurrent, it is possible 
to measure the difference between the effects of different materials or treatments by means 
of the ratios of the distances between the lines measured along an ordinate. It will be seen 
that parallel regression lines are simply a special case of concurrent lines; in fact, the ratio 
of the effectiveness of different materials may be measured in precisely the same way as for 
concurrent lines, that is, by the distances between the intercepts on a transversal. 

Tocher (1952) has recently described situations in which it is of interest to know whether 
the regression lines for different sets of experimental results are concurrent. He has given 
a test of significance of the hypothesis of concurrence, and a method of determining fiducial 
limits for the abscissa (i.e. the value of the independent variable) of the point of concurrence. 
The purpose of this note is to give what is considered to be a more satisfactory solution of 
these two problems, as well as to discuss further applications of the fitting of concurrent 
lines. 

When the point of concurrence of the lines is known in advance, or only the ordinate 
requires estimation, the problem is, as pointed out by Tocher, amenable to standard methods. 
When the abscissa requires estimation, the sums of squares required for the significance 
tests given here are found to be latent roots of a certain matrix. The analysis and tests of 
significance are found to parallel those given (Williams, 19525) for the interpretation of 
interactions in factorial experiments. 
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FORMULATION OF THE PROBLEM 


Our notation differs from that of Tocher, which is not always convenient for the present 
discussion. We consider m sets of values of x and y; the pairs of values, n; in number, in the 
ith set are denoted by x; and y;. We confine ourselves to the special case 


Le; = in, } 
=z} re X*n;, 
(% and X? being constants; X +0 since the x; cannot all vanish), which includes the com- 


monly occurring case of all sets of data having the same values of x. Without loss of gener- 
ality, in the mathematical discussion we take 


(1) 


z=0, (2) 

UH = 0. (3) 

We also put LDH = Ui (4) 
LHX = MIU (5) 

aM = N, (6) 

Mid = Nq, (7) 


and take the point of concurrence as (£, 7). 


(i) Ordinate of concurrence given 
We assume that 7 is known, and only £ is to be estimated. The regression coefficient for 
the ith line through (£, 7) is 
ee XY; — 9) (x; — §) 
, X(x;—§)? 
Gees &(¥;:—) 
a ee ®) 
The sum of squares for regressions through (£, 7), with m degrees of freedom, is accordingly 
S = Yn, bi(X? + £) 
i 
© 2,93 — 26 D299; — 9) + FX OY; — 0)" 
a i i (9) 
X*+ ¢ . 


Since the total sum of squares about 7 is independent of £, to estimate £ we may equally 
well maximize S as minimize the residual sum of squares. Put 





Ze 4, Dead) _B yng, =C. 
Then (9) gives (X2+ £2) § = AX*—2BXE+Ceé?, (9’) 
and S is maximized if ES = —BX+CE (10) 
or XS = AX -Bé. (10’) 


From (10) and (10’) we see that the maximized S must be the larger latent root of the matrix 


0 er (11) 





a~_—- G25 tof &. 


or 


8) 
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9) 
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The two latent roots, S, and S,, may be given a practical interpretation. S, is the sum of 
squares for regressions and the point of concurrence, and thus has m + 1 degrees of freedom. 
S,, with m—1 degrees of freedom, is the sum of squares representing departures from con- 
currence. Corresponding to the latent root S8,, we have, from (10), 


~ 2:9(9;— 1) 





= ~ a n¥i—1)* 
XB 
From the form of the matrix we see that S, is zero if and only if 
UXY¥i—%; (13) 


and from (12), that in such cases the coefficient of proportionality is — X?/£. Then b; = q,/X?, 
from (8), and the unrestricted regression lines for each set ail pass through the point of 
concurrence. In particular, there is no departure from concurrence when either all the q; 
are zero, so that the lines are parallel, or all the 7; = 7, so that the point of means is the point 
of concurrence. 


(ii) Ordinate of concurrence to be estimated 
When 7 is not given, it may be so chosen as to minimize the residual sum of squares. 
We then find 
9 = £q/X*. (14) 
But q/X? is the average unrestricted regression coefficient for all the data; therefore, as 
might be expected, the point of concurrence lies on the line of mean regression when 7 is 
estimated from the data. It is therefore appropriate to consider departures from mean 


regression only in testing the significance of departures from concurrence. 
The sum of squares for departures of individual regressions from mean regression is 


S’ = (Sn,b}— Nb*) (X2+ £2) 


ym(a—-9)?— 2651, 9:(9:-9) +S ni 
= X24 * 


A’'X*—2B’XE+ C'e? 
= wy, (15) 








with m— 1 degrees of freedom. As with S, the maximum value of S’ is found as the larger 
latent root, with m degrees of freedom, of 


A’ -B 

=" B’ Cc" ( 16) 
Again, the smaller latent root, with m — 2 degrees of freedom, provides a test of departures 
from concurrence. The effect of having to estimate 7 is merely to replace the individual 
means and regressions by departures from overall means and regressions, and to reduce 
by one the degrees of freedom. 
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TESTS OF SIGNIFICANCE 


In testing the significance of the differences among regression lines for several sets of data, 
the usual approach is to examine, first, whether the slopes of the lines differ significantly, 
and secondly, whether, if the slopes are assumed not to differ, the distances between the 
parallel lines fitted to the data differ significa: ly. For the present purpose, the two aspects 
are viewed in a different way. It is seen fron: the above discussion that the sum of the two 
latent roots involves both the differences among regressions and the differences among 
means, but that the larger latent root, representing differences among regression lines, may 
involve one or other aspect alone. 

The testing of significance of the latent roots of a matrix has been discussed siiclitiians 
(see, for example, Williams, 1952a,b, and references). The kind of test which seems appro- 
priate to the present problem is one which, first, tests the validity of the assumed model, 
i.e. whether or not the departure from concurrence is significant, and secondly, gives 
fiducial limits for £. We first consider case (i) above, with 7 known. Corresponding to a 
given value of £ we may determine S, the sum of squares for regressions through (&, 7), 
then calculate the quantities 

— (S:—S) (S— 82) 


i= 5 (17) 
and v2 = Sm. (18) 


These are distributed as sums of squares with 1 and m— 1 degrees of freedom respectively. 
v2 gives an exact test of departures from concurrence, corresponding to , and v, a test of 
concordance of the point of concurrence (£, 7) with the data. Provided v, is not significant, 
v, may be used to give a fiducial limit for S at any given probability level, from (17). 

It should be emphasized that, although it is often still possible formally to determine 
fiducial limits for £ even when », is significant, such fiducial limits for the abscissa of con- 
currence are irrelevant, since the hypothesis of concurrence is no longer tenable. It is 
therefore important to test v, first before considering the estimation of £. 

The corresponding fiducial limits for are derived from equations (9) and (17), and are 
most conveniently given as follows: 


£°[S — XY; — 1)*] + 26 X 0,9;(Y: — 9) + X*S — > n, QF = 0, 
~ 149:(Y;— 1) + X J(Sr,) 


aah S—dnyi—1? 





X(B + V(Sv,)) 


is 9 dem (19) 


Alternatively, since v, is a sum of squares with 1 degree of freedom, its numerator may be 
expressed as the square of a rational function of . Since, from (12), v, vanishes when 








XB 
E=-3—6 
or g= XB 








e 
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_ (S,-C) E+ XBP [(C - 8,)f—- XB? 





it appears that Sv, 

















B(X? + £2)2 
_ [2B +£X(C—A)— X?2BP ae 
ni (X24 £2)2 x 
The non-linear analysis of variance may be set out as follows: 
Degrees 
ef Snide Sums of squares 
Point of concurrence 1 Gi- Se — Sy) 
Regressions through point of concurrence m Ss 
Departures from concurrence m—1 Fuss 
Total between lines 2m S, +8, 
Residual N—2m by difference 
Total N x= (y;—7)* 
i 

















When m = 1, S, = 0, so that the test for a given point of concurrence reduces to a test of 
significance of the difference S, —S. 

For case (ii), when 7 is to be estimated, a similar approach may be used. Here, however, 
we require to determine simultaneous fiducial limits for £ and 7. The limits will be given as 
a closed curve surrounding the optimum point. 

If the variation of 7 about the value given by (14) is not considered, we may calculate the 





statistics vane 
{=o SAE sini 
par 0, = 1, (22) 


with 1 and m— 2 degrees of freedom respectively. As for the case when 7 is known, these two 
statistics may be used to test departures from the abscissa of concurrence £, and departures 
from concurrence, respectively. 

Now for the departure of a given value of 7 from its optimum value £q/X?, the sum of 
squares, with 1 degree of freedom, is 

N(nX?* — £9)? 
X?(X? + £?) * 

For a simultaneous test of significance of the concordance of £ and 9 with the data we 

may therefore take , N(yX*—€q)? 


"1 Xa(XP + Ey 


(23) 


S18, __ N(nX?— £9)? 





= §,+8,—8 —"a" + uxt B) 
= §,+8, s~ a 
= U1 +0,—U%. (24) 
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This is a sum of squares with 2 degrees of freedom. The set of values of € and 7 giving a 
constant value to (24) are the required fiducial limits for the position of the point of 
concurrence. 

COMPARISON WITH TOCHER’S RESULTS 


The present paper gives significance tests which are exact, and more detailed than those 
given by Tocher. In fairness it should be remarked, however, that we have treated only the 
special case defined by (1), and that there are unlikely to be exact tests for the general case 
treated by Tocher. As there are similarities between the two approaches, it is of interest to 
compare them. For convenience, we rewrite some of Tocher’s results in our notation; his 
equation numbers will be distinguished by the suffix T. 

Tocher first tests ‘he significance of the restraint that the lines be concurrent at a point 


with abscissa £. The criterion is a sum of squares with m degrees of freedom, which is 


identical with our S,+8,—S =v, +0. (25) 


Its minimum value, which Tocher takes as critical for the test (14T), is clearly S,, which is 
reached when S = S,, and £ has its optimum value. S,, however, is not distributed as a sum 
of squares with m degrees of freedom, as Tocher assumes; nor would a reduction of degrees 
of freedom to m-— 1 be correct, since the equation for optimum ¢ is not linear. Tocher’s test 
is, in fact, conservative, as he intends; that is, the significance of departures from con- 
currence tends to be underestimated (see Williams, 1952a, p. 23). 

The optimum value of & is that for which S = S,. For fiducial limits, Tocher uses the 
values of £ corresponding to a value of S making v, + v, significant at a given probability 
level (15T). The values of £ are in fact given by (19), with the appropriate value of S. We 
propose that the setting of limits be done in two stages. First, corresponding to any &, v,, 
representing departures from concurrence, is tested for significance. If v, is not significant, 
fiducial limits for £ are determined from the value of v, significant at a given probability 
level. If v, is significant, the hypothesis of concurrence must be rejected, and no meaning 
can be attached to a point of concurrence. 

Either method of setting fiducial limits is theoretically correct. However, it is con- 
sidered that there is a conceptual advantage in distinguishing between the test of the 
hypothesis of concurrence and the test of a given point of concurrence. When 2, is small, 


the use of v, also gives narrower fiducial limits than v, + v,. This point is illustrated in the 
numerical example given below. 


NUMERICAL EXAMPLES 
(i) Re-examination of Tocher’s data 
In order to compare the results given above with Tocher’s, we now carry out a comparative 
analysis of his results. It is noted that he takes 7 = 0. From his data (p. 115) we find 


A = 0:1013 5806 6, 

B = 0-9224 9221 3, 

C = 8396071400, 

S, +S, = 84974 29466, 
S, 8, = 0-0000 1767 7, 

S, = 8-4974 27386, 

S, = 0-0000 0208 0. 

Residual mean square = 0-9223 1 x 10-® (6d.f.). 
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Since the 5 and 1 % points of F', with 1 and 6 degrees of freedom, are 5-9874 and 13-745 
respectively, a sum of squares with 1 degree of freedom, tc be significant, must exceed 
5-5222x 10-* (5% level), 
or 12-677 x10-* (1% level). 
Since, for all values of S considered henceforth, the value of v, is not significant, departures 
from concurrence will be assumed non-existent. 
We now determine fiducial limits for S, and hence for £, from the equation 


v, = 8-4974 2946 6-—S— er = 5-5222 x 10-* 


and = 12-677 x 10-8, 
The fiducial limits are as follows: 





Present method Tocher’s method 








S g S g 
8-4974 8-4974 
Optimum value 27386 981-5 27386 981-5 
5% 21864 970 and 993 17899 968 and 995 
1% 14709 964 and 999 07234 961 and 1002 























The limits by Tocher’s method, given for comparison, are found from (15T), or from 
equation (19) with the appropriate value of S. These ranges are 17 % wider than those from 
the present method, owing to the inclusion of v, in the determination of S. 


(ii) Data requiring estimated ordinate 
In a study of the effect of basis weight on the properties of laboratory-made sheets of 
paper, three batches of pulp were taken and beaten for different lengths of time. From each 
batch, sheets of six different basis weights were then made, and mechanical tests carried 
out on them. The burst strength results from this experiment are set out in Table 1, together 
with values calculated from them. 





Table 1. Burst strength results 
































Beating (revs of Lampen Mill) 
ou 4 ad Total 
S 1125 4500 12730 
i=1 4+=2 += 3° 
10 =f 1-73 1-98 3-11 
20 -3 4-99 9-69 12-83 
30 a 8-74 17-26 22-69 
40 1 12-60 24-52 32-45 
50 3 17-04 31-36 48-22 
60 5 21-18 43-29 59-94 
by; 66-28 128-10 179-24 373-62 
6q, 137-26 278-82 400-08 816-16 
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Table 2. Test of departures of regression lines from origin 








Degrees Sums of Mean 

of freedom squares square 
Correction for mean 3 8821-604 
Regression through means 3 3666-356 
Sum 6 12487-960 
Regressions through origin 3 12311-932 
Departures from origin 3 176-028 58-676** 
Residual 12 28-554 2-3795 




















** Significant at 1% level. 


A preliminary analysis, as given in Table 2, shows that the three regression lines of burst 
strength on basis weight depart significantly from the origin. In order to compare the 
trends for different amounts of beating it would be desirable to base the comparison on 
regression lines through some other common point, if such a model were concordant with 
the data. The common point would represent the basis weight for which the value of burst 
factor was independent of beating. The ordinate as well as the abscissa of this point will be 
estimated. 

For the independent variable we take 


basis weight — 35 
t= ; : 





so that x has the values —5, —3, —1, 1, 3, 5, and X? = 48. The quantities required for the 
analysis are given in Table 1. 


The mean burst strength is 20-76, and the mean regression coefficient on z is 


816-16 


b= —Fi0 


= 3-886, 


so that, by (14), the optimum value of the ordinate corresponding to £ is 
9 = 3-886 + 20-76. 


; 494-36940 —726-07978 
The matrix (16) ts § 7126-07978 Hearted 


whence Si = 1560-83437, 
Si=  0-03356. 
The optimum value of (£,7) is (— 5-02, 1-26) from (12) and (14). Departures from con- 
currence are clearly not significant, so attention may be confined to v,+v,—v, and the 


determination of the fiducial boundary for (£,7). From the residual mean square given in 
Table 2, the values for v, + v,—v, required for significance may be determined as 


18-490 (5% level), 
and 32-964 (1% level). 





= re 
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The equation for the fiducial boundary is, then, 
(726-12 + 1954£ — 8471)? 
(E24 35) (1066£? — 4960£ + 5768) 


210( — 3-886 — 20-76)? 
= 18-490. and 32-964. 
(2 +48) 








Of particular interest are the maximum ranges for £ and 7. The maximum range for £ is 
given above when 7 does not depart from the value given by the mean regression line. Then, 
meee (2). S = 1542-34, £=—64 and —4-0 (5% level), 

S = 1527-87, §=—7-0 and —3-7 (1% level). 


The maximum range for 7 is given when £ has its optimum value — 5-02; negative values 
being inadmissible, the limits are found to be 


0 and 31 (5% level), 
0 and 3-7 (1% level). 


CoNCLUSION 


The main emphasis of this paper has been to show that the concepts developed for tests of 
significance in multivariate analysis and for interactions may be carried over to dealing 
with a problem of concurrent regressions. It would clearly be possible to extend this work 
to regression analysis with two or more independent variates. The quantities required for 
testing significance would then, for k variates, come out as latent roots of a (k +1) x (k+1) 
matrix. Such a generalization, however, appears to lack practical interest. 
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APPROXIMATE CONFIDENCE INTERVALS 
II. MORE THAN ONE UNKNOWN PARAMETER 


By M. 8. BARTLETT 
University of Manchester 


1. RETROSPECT 


In a previous paper (Bartlett, 1953; this will be referred to as I) it was shown that the 

approximate confidence interval obtained from the asymptotic normal distribution for 
0L/00, where L=log p is the log likelihood depending on a single unknown parameter 0, 
could usually be improved by the use of skewness or further moment corrections. The 
purpose of the present paper is to examine the further problems that arise in the case of 
more than one unknown parameter. It is perhaps advisable to point out that while the 
direct use of quantities like L and its derivatives permits a fairly flexible sampling basis, 
the asymptotic normal distribution remains the basis of the approximations. We may note 
(see §7) the application of the proposed method to non-classical problems such as time-series 
analysis. It could not, however, be used in problems for which the central limit theorem 
did not apply. This remark appears relevant if we consider (see §§ 5 and 6) some of the more 
controversial problems involving nuisance parameters, as R. A. Fisher’s fiducial probability 
solutions have been given by some writers (see, for example, Barnard, 1950) an approximate 
frequency interpretation by means of special (and in my opinion rather artificial*) sampling 
interpretations. 

2. SIMULTANEOUS MOMENTS 


The systematic method of obtaining relations between the moments of 0L/00, 02/06, etc., 
is obviously extensible to the case of several unknowns 6; (i = 1,2,...). The derivation 
of the more general formulae rapidly becomes rather tedious, and I shall only record the 
formulae up to the third-order moments of 0L/06,. On the usual assumptions about 
differentiation with respect to 6; commuting with the integral sign, we may write (cf. I) 


eL 
l= [oe.+79 = [revexp|2.n 55 HEL ag ag | 


oL OL oL eL 
whence E 55, = 0, 5530) ~ "\- aa00) 
aL aL aL) _ (- OL . aL @L \- i el) (aL eb 
30,00, 55;| = 4\— 80,60,00,| ”\20,80,00,| —”\80, 30, 7) 30, 00, =a, 


Differentiating the second-order equation, we see further that 
sa (ap, eL |+ 54.20-36\* oL sa} OL OL) _ 
06,06, 06, 00, 06, 00,00; 06, . 06,06), 55, 44 


* It is not my purpose to cloud the present discussion with any controversial fog. I would, however, 
note that the sampling procedure suggested by Barnard in the Behrens-Fisher problem, if deliberately 
carried out, would lead approximately to the solution claimed, but at the cost of throwing away the 
information on the variances available in the expected further observations. It seems to me that we 
should rarely want to do this unless the resulting loss of accuracy in our inference on the mean difference 
was negligible, but this is just the limiting situation for which we do not require small-sample theory 
at all. 
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From these last two equations, we find that 
Ee gl Fe 2 
06; 08; a i ,e088,)* 00; 
where I,;= E{—0*L/00,06;}. Formulae of the same order involving less than three para- 
meters may be obtained from (1) by contraction (i.e. making two or more 6’s identical). 


C) é ' 
Int 56. Tat 56, Ii;, (1) 


3. CONFIDENCE REGIONS 


As already noted in I, the extension from approximate confidence intervals to confidence 
regions is in principle straightforward. Thus for two parameters @, and @, our first large- 
sample approximation is to consider the joint quantities 0L/00,, 0L/00, as normal with zero 
means and known covariance-matrix. From these we may, for example, construct an 
approximate y? expression, writing this (for convenience later) in the unsymmetric form 


aL\? aL I, eL I, 
(a) [ts +(e Pan) | (@-7): (2) 


Values of @, and @, for which the expression in (2) remains less than x3, the critical value of 
x? for two degrees of freedom, define an approximate confidence region. An even simpler 
expression asymptotically equivalent to (2) is — 2/1 —L,,,, ], discussed more fully later.* 
It is, perhaps, hardly necessary to consider any skewness corrections to (2), for in it only 
squares of the approximately normal quantities are involved. Any skewness corrections 
would merely help to locate the minimum and maximum ‘admissible’ values of 0, and 6, 
more symmetrically (in terms of probability contributions to the total probability of error). 
However, to illustrate the procedure these corrections will be deduced. -Consider 


oL oL \2 
Ty (A192) = 55 +A, [ (&) -Iy}, | 
(3) 
oL 1,,0L oL oL oL\?2 
T,(9;, 9s) = 39. ay ae a dao (55. a) -Iy| + ul 55-59, —4e | +o] (555) - Ie | | 


In order to replace the random variables in (2) by more nearly normal variables, we require 


the third-order cumulants of 7, and T; to be zero to the next-order approximation. This gives 


Kyo + 6A, 12, = 0 (ef.1), 


I 
(Ke = 9) + 2Ago Ti, + 2Aq3 Tur Tie + 2Age Ti, = 9, 


; R, R, ) (4) 
(2 . kn tT * ks) + 2A (Tu Tea Ta) + 4a ia foe I 2) = 0, 





(os 3 i + ae Koy - Keo) + Bhaa( 22 -7)' = 0. 

The last three equations may be solved successively from the bottom to the top equation; 
in the particular case when J,, = 0 they become 

Kg + 2Agg ly, = 0, Kyg+ 2A Litoe = 9, Ko3 + 6Agg 13, = 0. (5) 

* Itis convenient to defer discussion of this alternative until § 8, as so far no explicit use of maximum- 


likelihood estimates has been necessary, although we shall find their introduction in general unavoidable 
from §4 onwards. 
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Thus for a normal sample with 

















Ete ot 6,)? | 1(n-1)8? 
Pent aS (8) 
aL _ n(%—9,) ds cn (n—1) 8? + n(%— af 
so that 20, = "eo 20, 7 2, 268 
OL mn _ OL _e—-6) OL __ im , (n—1) s+ n(F- 4)" 
06; 8,’ = 08,08, G@ ° 0&3 263 3 
n n 
ti = G, I, = 9, Ta = 36% 
5 ‘i (7) 
Kgq = 0, a = 9 Kiz = 90, “os = 68? 
1 20. 
we put A, = Ay = 0, Aw =-5) An =- Be 
_ n(%—4,) 
eer 
T, ot gz 1)s?+n(%-0,)? 1 —- ne | (se) n 
20, 263 ~ 2n 


Now as the crude approximate region is “ee by 


n(@—0,)* (22). 
0, 50 


we may write in the next approximation 





> 








_ 8 :@-)e 1 nz — a 
1, = — 99, + a +3 +55 w,[4-"S 
ww te p94 tf 1 4 KO ) 
aay bea) +35, [2 +3, 8) ®) 
obtaining as the next-order sacs ee set for the confidence region 
nF — A) 
ne ny = ot (9) 


where 7) is given by (8). As an obvious check on this result, we may note that if we had 
initially made use of the facts that % and s? are sufficient independent statistics for 0, and 
6,, we should have arrived at practically the same equation (9) by making use of the normal 
approximation for s* developed in I (as noted in I, this is not the best normal approxima- 
tion for s?, but the one arrived at by the present systematic procedure). The only difference 
would be that the coefficient of 73 in (9) would be 263/(n — 1), this difference arising from the 
neglect, justifiable to the order of approximation considered, of the alteration in the 
variance of 7; due to the third-order moment corrections. 


4, THE CASE OF ‘NUISANCE PARAMETERS’ 


We come now to the more formidable problem of ‘nuisance parameters’. To fix our problem, 
consider first the case of two parameters 0, and 0,, with 0, the nuisance parameter. The 








6) 


7) 


3) 
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quantity involving 0L/00,,0L/06,, which provides an alternative to the maximum-likelihood 
estimate 6, is OL I, oL 

00, Ty, 00,” 
with variance J,, — 1,3/I.= 41.2, say. But only in large samples, when the expression in (10) 
and 0L/00, become uncorrelated normal variables, with 0L/00, an effectively sufficient 
statistic for 0,, can we expect the standardized variable 


T= (Fy - 255) [vias (11) 


to be completely insensitive to the estimate we substitute for 0,. In smaller samples this 
substitution will affect the distribution of 7’. In principle this effect can be successively 
allowed for, this being the proposed procedure; the effect of the correction at one stage may 
be reconsidered, a further correction of a higher order introduced, and so on. As a general 
method, this is, however, evidently going to become rather intractable after the first stage, 
for the previous advantages of working directly with the derivatives 0L/00,, 0L/00, are in 
general lost. 

It is not possible to ignore the sufficiency properties, if any, that exist for 0,. For if 
0L/00, is equivalent, even in small samples, to a single sufficient statistic for 0, (when 0, is 
assumed known), we can* proceed from 7' by building up a polynomial series solution in 
0L/00, and 0L/00, which is statistically independent of 0L/60, and hence mathematically 
independent of 0,. But if 0L/00, has not got such sufficiency properties, it can only be con- 
sidered ‘sufficient’ for 0, if regarded as a random function in the parameter 0,, and statistical 
independence merely of the random variable 0L/00, would appear to be no longer adequate. 
I thought at one stage that this difficulty might be overcome by eliminating also the 
dependence on higher derivatives such as 0°Z/063, but this may be shown (by a counter- 
example, see §6) to be impossible in general. In the genere! case it seems therefore necessary 
to fall back on the very cumbersome substitution procedure referred to in the first paragraph 
above. 

Fortunately this substitution, if the maximum-likelihood estimate of 6, for given 0, is 
used, only affects the distribution of the statistic 7’ to the order of approximation O(1/n), 
where n indicates the sample size, a stage higher than the skewness correction, which is in 
general of O(1/,/n). For we may write the maximum-likelihood estimate of 0, (for given 0) 


approximately as aL 
+5 AD. a Ty, 


wD oT 
so that +59. 50 5, | I (12) 


The estimated value of the second term in (12) is of course zero. Its sampling effect on 7, may, 
however, be investigated. The first correcting term in Z{(7,)"} is 


aL ar aL om" 
wt a S. = 
ad {7 50, =| Ia} # (7a: 20, “| Ia} 


L\2 roth OL Oley / . 
= 59, 2(2 Sa Ps} 80" ( aa.) |} 82” S|) +2( Se, ae | 
* (Added in proof.) I do not mean to imply that we should, for in this case the probability of the 


sample, given 0L/20,, provides a distribution independent of 0, (cf. Bartlett, 1936), and the series 
expansion, illustrated by §5 on the ¢-test, is unnecessary except perhaps for finding approximations. 


(10) 
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From the asymptotic independence of 7 and 0L/06,, it is easy to see that all these terms 
vanish to the first order (for the second and third, we replace — 07/063 in the third term by 
- I,, to the first order, and couple with the second term). Other asymptotically efficient 
estimates will be equivalent to the same order and may be used if more convenient; their 
effect at higher stages would have to be investigated separately. 

For reference if required, the next term in the expansion of 60, gives 





aL (aL OL, p] Ly 
30,002, |00,L003 * | \o0,) 308}, (13) 
4 Log Es 2132 
, OL oF oT =1/0L\?eT ‘ 
whence +36, 56, | h+ S555 +5 (55: 203 T3,, (14) 


where f is the expression in curly brackets in (13). 

A result of some interest which may be derived from an expansion like (13) is 

A oe 
= 2 
E(O} = 0- (2 (as +5 =4\ [B+ (15) 

where the suffix 2 has been dropped for convenience, as (13) and (15) refer effectively to 
a single unknown parameter. 

While it would be too laborious to carry through the complete details of the general 
correction to the O(1/n) stage, it may be advisable to indicate how this would proceed, so 
that the iterative nature of the corrections is made clear. To O(1/,/n) we have 


= [(T-47,(T?-1)]., 


where y, is the skewness of 7’. Owing both to the properties of S and to the substitution of 
the estimate S, for S, so that S, is related to S by an expansion similar to that for 7, in (14), 
we have O(1/n) additions to the mean, variance, third and fourth cumulants of, say, 
C1, Cy, Cz and c, respectively. Then a new quantity R correct to O(1/n) is (cf. Cornish & 


Fisher, 1937) ee S, —c, — 3,8, —4c4(S? — 1)—sye,(S?- 38,); (16) 


the effect of further substitution for 0, in c,,c.,c, and c, will not affect the distribution of 
R, before at least O(1/n*). 

As noted in I, a statement of general conditions could hardly be expected under which the 
required monotonic relation existed between our final statistical function and 0, if a valid 
confidence interval is to be possible, but except for very small samples we should not expect 
these higher-order corrections to disturb any such relation holding in the relevant range; 
this expectation has been supported by numerical examples. 





5. THE t-TEST AND THE BEHRENS-FISHER PROBLEM 


It seems of some interest to demonstrate the procedure first in the case of the standard 
t-test, even although not only do we not require it here but from the above remarks it does 
not illustrate the completely general case. 

From the expression (6) we obtain for the quantity 7’ defined in (11) 


n(Z —9,) 
T =*—,—.. 
V9, 





iy 
it 


3) 


4) 
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From the sufficiency property of 0L/00, in this example, a successive elimination of the 
higher cumulants between 0L/00, and the modified series 


List fa, Hol (eg) ~™ | + 930, %e | +0 (5) 2a + 


will eliminate the dependence of T;, on 6,. We choose the yv’s also so that 7, becomes normal. 
The resulting equations are similar to (5), being 


K39 + Bao li, = 9, Koy + 2y Lyte = 9, Kip + 2oe 13, = 9, 











OF fog = 9, fy, = — 9/2, og = 0, 80 to the next approximation stage 
T = _n(z—- me (1 -; (ne 1) 
ane” 2 nO, 
/n(% —9,) 
(n— 1) s?+n(%—6,)? 
Dara ica! 


This result is not of course yet correct to O(1/n), but there is little point in continuing 
further with the corrections. If we make use of (12), we reach the same result to this order 
more simply. Higher approximations would merely develop the Fisher (1941) asymptotic 
expansion for ¢. 

In the case of the Behrens-Fisher problem of the difference of two independent normal 
variables with separately unknown variances, we have 


2 
= — log (0, 1+ 9) 3 G78, ~ 9 8A 4M % "log 8 1 ng, (17) 


7h 2 er 
where to correspond with the degrees of freedom n,, n, for the estimated variances v,, V2, 
the unknown variances of ,, x, are taken as 0, and ,, and 6, for the unknown true difference 
corresponding to x=2, — 2p. 

We easily deduce that ne es 9s 

V(A, +42)’ 

and if we substitute the most convenient estimates v,, v, for 9,, 6, (these are not the strict 
maximum-likelihood estimates for given 6, but are asymptotically equivalent as n,, n, 
increase), we arrive at the usual approximate ¢-quantity (see, for example, Welch (1947)). 
To reach a result correct to O(1/n) we should require (i) to correct the kurtosis of 7' (its 
skewness is obviously zero), (ii) to make the O(1/n) correction corresponding to the effect 
of the substitution of v,, vj for 6,, 6. There is little point in carrying this through, for it is 
clear that it would lead to the next order correction in the asymptotic series solution to this 
problem proposed by Welch (1947). My impression (with which Dr Welch may not 
necessarily agree) is that his remarkably ingenious solution to this problem may be logically 
classified with the further asymptotic expansions based on the normal approximation which 
are the subject-matter of I and the present paper. 

If we alternatively tried to extend the first procedure illustrated for the t-test, to cover 
two nuisance parameters 0, and @,, we should eliminate the dependence on both 0/06, and 
0L/06,. It has been verified to the first stage of the approximation procedure that this leads 
similarly to a consideration of the approximate t-quantity (x —6,)/,/(v,+v,), but we would 
not expect it to extend to higher stages for the reasons given in §4, as 0L/00, and 0L/00, do 
not correspond to a pair of sufficient statistics for 0, and 0, (for given 9,). 
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6. FIsHER’S ANALYSIS OF VARIANCE PROBLEM 


A simpler problem for which such difficulties may be illustrated is the second problem con- 
sidered by Fisher (1935) in his first derivation of solutions in terms of simultaneous fiducial 
probability. This problem may be stated for our present purpose as follows. We have two 
x?-quantities 


mya 
6,+A0.” 6,’ 

with degrees of freedom n and n,, and require a confidence interval for 0,. Here 
L = —Flog (0, +A.) — 3 (asus, a) -3 = log 6,— as 


= L(n) + L(n,), say, 


oL 
00, - 2(0, iT 3)? [v—(0,+A9,)], 
aL aL (mg) 


56.7 35, re’ 
n 
Ta = 56,4 A0,° B= Mav Ia = hat 39 


6° 


whence after reduction we obtain 





nn 
= (o-Ae-9), | se $A0,)4 AB z" (18) 


While this problem appears at first sight rather simple, depending on two joint sufficient 
statistics v and v, for 0, and 0,, the difficulty arises from the absence of a sufficient statistic 
for 0, (for given 6,). Hence we cannot expect it to suffice to modify 7’ to be independent of 
0L/00,. In this example we may notice further that it is impossible to make T' independent 
also of, say, 0°L/003, for as T', 0L/00, and 0*L/063 are all linear in v and vg, there is a linear 
relation between them. Thus we return to the substitution procedure recommended in § 4. 

It will also be seen in this example that (as in the Behrens-Fisher problem) the use of the 
normal approximation depends on two sample numbers not being too small, in this case 
n and ng. The best estimate of 0, for given 9, is not v2, but can be obtained from the equation 
0L/00, = 0 using v, as a first approximation. This leads to the estimate 


My esate ies An 


om 7 (9, + Av,)? Y , (8; +Av,)?) ° (19) 


We may not substitute the simpler estimate v, (unless n>), for though its accuracy is of 
the same order as the estimate in (19), it is not asymptotically independent of 7’ and hence 


the effect of the substitution will be of O(1/,/n, 1/,/n,) and must also be corrected even at the 
first stage of the correction. 


The skewness of 7’, unaffected to the first order by the substitution, is 


3 
8 (0, Ot ee A a Ihe (20) 


where J,;.. is the expression under the square root in (18). 











M. S. BARTLETT 313 


Thus corrected to O(1/,/n, 1/,/n,) we may consider our confidence interval for 0, given 
by either of the two equations (cf. I) 


T —y,(T?—-1) = tz, (21a) 

T—tyi(?—-1l)= +4, (216) 

where in (21a) and (216) it is understood that the estimate in (19) is written for 0,. An 

equation correct to O(1/n, 1/n,) would be rather complicated, for in addition to a correction 

for the substitution for 0,, a correction for kurtosis would be required; this would make the 

total correction exceedingly cumbersome, as a bias, variance correction and the asymmetry 
must all be allowed for at this stage (see §4). 

However, from the check made in I on the accuracy of the confidence interval for a single 


variance, the use of equation (21a) or (216) would be not unreasonable for moderate-sized 
n and Nz. 














Table 1 
Be Lower and upper 
“y x? quantities 9, ™ Ya P = 0-05 limits for 6, 
0 0-980 0 0-55 (0-56), 2-32 (2-25) 
12 11-76, 13-43 { 2 1-960 2-240 0, 3°53 
4 2-940 4-476 0, 4°74 
0 1-270 0 0-72 (0°72), 3-00 (2-92) 
15°24, 5-46 { 2 2-540 0-910 0-91, 5-59 
4 3-810 1-820 ' 0-86, 8-06 
0 0-308 0 0-14 (0-15), 1-28 (1-13) 
6 1-85, 7-80 { 2 0-617 2-600 0, 1-16 
4 0-926 5-196 0, 1-15 
0 0-940 0 0-44 (0-45), 3-92 (3-46) 
5-64, 4-75 { 2 1-880 1-583 0, 7-09 
4 2-820 3-167 0, 10-32 


























To illustrate and examine the numerical use of equation (215), it was applied to a few 
artificial sampling estimates in the case when there are only two observations per group, so 
that in the analysis of variance table the between-and-within-groups have an equal number 
of degrees of freedom, n. The true mean square between groups is 20, + 0,, so to correspond 
to the notation above (with A = 4) we take as v one-half of the between-groups mean square 
and v, as the within-groups mean square. Two cases were taken for n = 12, and two for 
n = 6.* In addition, the same effective x? quantities were used when comparing the results 
for two different values of 9, (2 and 4), so that the extent to which the inference depended 
on 8, could be more easily seen. On this last point, however, it should be recalled that the 
statistic used is only independent of our error in estimating 0, up to and including O(1/,/n) in 
distribution, and in any particular case (for which the error in 0, will alter with 0, even for 
the same x?) will still change to an extent of O(1/,/n). 

Moreover, the information on 6, is of course less as 9, increases, so the width of the 
interval should in any case tend to increase with 0,. For comparison the case 0, = 0 has been 

* It had originally been intended to examine ten cases each, but as this would still be far from 
sufficient to test the approximation adequately, it was later thought hardly worth while. The sample 


quantities v and v, made use of were partly selected from the larger set as interesting combinations to 
try out, and thus are not entirely random. 
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included, as this reduces to the single variance problem discussed in I, and the exact 
confidence interval in this case can also be given in Table 1 (in brackets). The approximate 
intervals for 6.+0 were obtained from equation (216) graphically, plotting against 0, for 
the lower limit and 1/6, for the upper limit (these scales are indicated by the form of the 
equation at the permissible extreme points 0, > 0 and 1/0, > 0). The true value of 0, through- 
out was unity. 

The lower limit for @, will often be the minimum permissible value zero, rather than the 
value corresponding to ~ = — 1-645, which may well not be reached, especially for small 
n and large @,. This is not unexpected, as it is well known that even the strict maximum- 
likelihood estimate v—Av, in this problem may often turn out to be negative; it implies, 
however, that we are not making full use of the 0-05 risk of error at the lower value, so that 
it seems useful to think of the error risks for the lower and upper limits separately.* 


7. APPLICATION TO TIME-SERIES ANALYSIS 


It has previously been noted (Bartlett, 1952, at end of paper) that the technique may 
be applied to unknown parameters in time-series. We are now in a position to consider 
further the solution proposed there for the auto-correlation of a normal Markoff process, 
in discrete time, and to give a modified solution corrected to O(1/,/n). The quantity L is, 
apart from the end-correction (which will not affect our solution until O(1/n)) 


L = -Slogot—5 5 Viet, (22) 
where X,— £X,_, = Y,. 


(For simplicity we assume, as before, that H{Y} = 0. In (22) the observation X, has been 
discarded.) We havet ab 8 
op zx — PX,1) X lo, 


n 
I= 1—~” T2=9, Ig = aot’ 

where 0, =f, 0,=0%. We have further 

PL 9, Fy _ _ 2h _ 

op OB > OR 

oL\* 6nB 
h Bi (2%\"\ .. —9"F _ 
ore (sa) |- a= 
We note finally that oy = z X21 — ?)/n. (23) 
r=1 


Hence our approximate confidence interval for 8 is determined from the equation 


z (%, = PX,_) X,-4/*% B 
v{n/(1 — B*)} ~~ Bay —-l)=+y4. (24) 


* In the case n = n, the expression in (21) at 0, = 0 is Jn(v—Av,)/(v+Av,), which must be greater 
than 4 for a non-zero lower limit. When Av,/v< }, it may be shown that the monotonic relation with 
6, breaks down near 6, = 0, but of course if 6, is near zero Av,/v~ 1+0(1/Jn), so that no difficulty 
should arise within the ‘admissible’ range of 0, except perhaps for very small n. 

+ In equation (25) of the paper cited, o was inadvertently omitted, but this did not affect the 
deduced confidence interval equation (26). 








en 


23) 
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It is interesting to notice the appearance of a non-zero skewness correction, in contrast with 
the classical regression problem. 

When we are able to assume normal residuals (or some other complete specification of the 
process) similar methods can clearly be applied in other time-series problems. 


8. RELATION WITH GOODNESS OF FIT TESTS 


It was pointed out in §3 that an alternative to the approximate x? expression in 0L/20, 
would be to use — 2(L—L,,,,.), where L,,,, denotes the maximum of L for choice of the 
unknown @;. This last expression is more familiar in the use of the y? test as a goodness of 
fit of hypothetical values of the 6;, but the two uses are of course closely connected. In this 
final section we shall demonstrate to the next order of approximation (O(1/,/n)) the asymptotic 
equivalence (in distribution) of this alternative y? approximation,* equivalent to the 
likelihood ratio A and given by 


—2logA = —2(L—L,,x). (25) 
Whether we make use of A or the general quadratic formula 
OL oL 
pis FO > oo. 
wt %6,90,; (26) 


where in (26) J‘ denotes the inverse of J,; (the summation convention is used throughout 
this section), will thus to this order be a matter of convenience. Formula (25) is formally 
simpler than (26), but involves finding the simultaneous maximum-likelihood estimates 6,. 

We have (expanding L(6,) about L(6;), rather than, as is perhaps more usual, L(@;) about 


L(4;)) : - “eae 5 aL 
L(6,) = L(0;) +\(6,—9) 70,7 4(4,—9,) (4;— 95) aon 
eL 


+46,-4,) (6; -9;) (6,—- (27) 


°x) 56.80,00,,* 


The relation between 6, and L may be developed analogously to the case of one unknown 
quoted in §4 (equation (13)). We have 


oL az eL 4 s SL 
0= 20, + (s- 99) 56-26, x6, + 295-95) (7.— 9) 55,20,.06, + 


a% 27, aL 
~(6,- —9;) Iy+{6,- 9;) (a7 00; +13) +46;— 9;) (6, x) 00 gabe t 


Denote why inverse of J,; by J, Then 


‘ aL jaLT OL eb eL 
—- ik hj Jik ik Thi Jom pines a os « E-L 
y—9n) = E55. +l 30,00, +Ig|I Gia tei ti $0,58,30-}* 
- 185 +fis (28) 
say, where 
Se “]s erjpmob OL OL se 
fa = (5a. 90,00,0 te ee 0,00, 00,00,00 2 


* For a discussion of the large-sample y* approximation for the likelihood ratio A (including the more 
general case where L is replaced by its maximum Lj, with respect to a smaller subclass of the un- 
known parameters), see Wilks (1938), where it is shown that — 2 log A is equivalent to a x* except for 
terms of O(1/,/n). 
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Hence 


ae ae = ab Foe .) 
sf i 


ki hj 
20,, 20, * \**39, no +1856," a6, \50,06, 


oL y ng OL tym OL BL 
ki A: . TO 
ue sai 5 35 +H 00; 0," 3, 00,00;06,, ree 


or if we put 0L/00,00, ~ —I,; in the fourth term on the right, this may be written 


+2 


OL AL (ALL [rir OL ‘ GREE ab res eaten 
die #9; 96, * (o0,0, (7 176,00, +") + 330,30, 30, 30,00,00.. = \+ 
aL aL 
oe FOP aie es 
1°00, 90, * ® (29) 


say. Now as the skewness of 0L/00, does not affect the distribution of the quadratic first 
term, it is, even to O(1/,/n), a valid x? quantity, as we saw in §3. It remains to consider the 
effect of the remainder term R. Now although R is of O(1/,/n), we can readily show that 
E{R}, and E{I*dL/00,,0L/00,; R} are of O(1/n), so that — 2(L—L,,,, ) has the same mean and 
variance as I‘*0.L/06,,0L/00, to O(1/,/n). The same method can be applied also for the higher 
moments. To this order the same x? approximation may thus be used. Although these 
average values involve higher-order moments of L derivatives than we have listed, for our 
immediate purpose we do not require their explicit values. We merely note that 


2 
ES oe (ag-aor eL 


158,38; 30,00, +Iy)|< O(n), 
= aL aL = BL 


& 


aA AM DA. a0 aA AA 2 

20, 00, 00), 80,00, 56-| < O(n2), 
aL aL\ aL aL ( #L 
00, 00,) 06, 2 

(5, 20) 00, 08m (5-36; + he) < O(n?), 
aLaL\ aL aL aL = OL 

EA (59, 55) 30.00. 20. 54-00.06,| <9 nt 
30, 20,) 00, 20,, 30, 50,00,00,) <””) 


E 


and as I‘ is O(n-"), the result follows. It is obvious from particular examples that 
— 2(L—L mx.) does require modification to O(1/n) (so also does the alternative quadratic 
expression). In principle a general correction to O(1/n) could be worked out by the above 
methods, though we have seen that corrections at this order are rather formidable to 
evaluate by the present general approach. Particular problems, e.g. in time-series analysis, 
might be somewhat more tractable if handled individually. 

By an obvious extension of the argument, the criterion — 2(Li,4, — Dmx), where Li, 
refers to a smaller subclass of the unknown parameters than L,,,,, is equivalent in dis- 
tribution to a x’, also up to and including O(1/,/n). 


I am indebted to Mrs A. Linnert for assistance with the computations for Table 1. 
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1. InTRODUCTION 


For many experimenters the most commonly used statistical tests are those for comparing 
sample means and sample variances. The test used for the equality of the means of k groups 
of observations is usually the analysis of variance test (or equivalently the t test when k = 2), 
whilst the test used for the equality of variances of k groups of observations is Bartlett’s 
(1937) modification of the Neyman-Pearson (1931) LZ, test (or the equivalent F test when 
k = 2). 

The tests mentioned are derived on a number of assumptions, in particular, that the 
observations are normally distributed. Usually, however, since little is known of the 
populations from which the samples are drawn, these tests are used, of necessity, as if the 
assumption of normality could be ignored. So far as comparative tests on means are 
concerned it appears (perhaps rather surprisingly) that this practice is largely justifiable, 
for thanks to the work of Pearson (1931), Bartlett (1935), Geary (1947), Gayen (1950a, 5), 
David & Johnson (1951a,6) there is abundant evidence that these comparative tests on 
means are remarkably insensitive to general* non-normality of the parent population. 

It would appear, however, that this remarkable property of ‘robustness’ to non-normality 
which these tests for comparing means possess, and without which they would be much less 
appropriate to the needs of the experimenter, is not necessarily shared by other statistical 
tests, and in particular is not shared by the tests for equality of variances, mentioned above. 

The sensitivity to non-normality of the tests for comparing two variances was first pointed 
out by E. S. Pearson (1931) whose findings were confirmed by Geary (1947), Finch (1950) 
and Gayen (1950a). These authors showed that this test is particularly sensitive to changes 
in y, from the normal theory value of zero. (The notation 7,, y2, etc., will be used for the 
standardized parent cumulants. In Karl Pearson’s notation y, = /f,, ¥2 = £,—3.) In the 
present paper it is shown that the sensitivity is even greater when the number of variances 
to be compared exceeds two, and, indeed, that the sensitivity to non-normality of the L, 
criterion or the equivalent Bartlett test can be of the same order of magnitude as the 
sensitivity of criteria such as 6, specifically designed to test normality. 

It is further shown that the difference in sensitivity of the test on means on the one hand 
and the test on variances on the other arises from an essential difference in the nature of 
these two types of tests and this suggests how the extreme sensitivity to non-normality 
of the variance tests can be remedied. ~ 


* By ‘general’ parent non-normality is meant that the departure from ‘normality, in particular 
skewness, is the same in the different groups, as could usually be assumed when the data were from 
an experiment in which the groups corresponded with different applied treatments to be compared. 
In tests in which sample means are compared, general skewness tends to be cancelled out; larger effects 
are found, however, if the skewness is in different directions in the different groups. 





ror, rh 





G. E. P. Box 319 


2. DISTRIBUTION OF VARIANCE TEST FOR LARGE SAMPLES 


Suppose there are k groups of independently distributed observations with n, in the ith 
group and > n, = N. The ith observation of the ¢th group is denoted by y,; and the usual 


estimate of variazice in the tth group, having ¢, = n,— 1 degrees of freedom, by s?7. The average 
of the estimated variances is peta by s?, where s? = » ¢,8?/® and ® is the total number 
of degrees of freedom X¢, = 

The L, criterion due to : eyman & Pearson (1931) tests the hypothesis that the 
variances in the groups are all equal, given that the observations are normally distributed. 
Alternatively, M,, the logarithmic modification due to Bartlett (1937), may be employed: 


M, = Olns*~ > ging. (1) 
t 


Following Neyman & Pearson (1931) it may be shown that when the appropriate null 
hypothesis is true and on the assumption of parent normality, M, is distributed in large 
samples as x2_, (where 72_, means y* with k— 1 degrees of freedom). This is the basis of the 
test proposed by Bartlett (1937). He shows that, even for fairly small samples, to a close 
approximation M, is distributed as (1+ A) x3%_,, where A, an adjustable constant which 
tends to zero for large values of ¢,, is given by 





A= sa lP 4-3} se 
It is convenient to define the quantity 
1 ¢ au 
a=(1+5555n) - (3) 


When ¢ tends to infinity this becomes 
d = (1+ 472)". (4) 


Thus 6 is an alternative measure of kurtosis and the quantities £,, vy, and é are related as 
follows: 


B 1 2 3 6 © 
Yo —2 -1 0 3 ro) 
é ©O 2 1 0-4 0 


Denote by s3(y,) an estimate of variance having ¢ degrees of freedom and based on 
observations drawn from a population with finite cumulants whose measure of kurtosis is 
Yq. Then 83(0) is an estimate from a normal population. The estimate s3(y,) has mean o* 
and variance 20*/(d¢). Also from the general expressions for the higher cumulants of such 
an estimate it is seen that as ¢ tends to infinity the distribution of s3(y,) tends to normality, 
and for sufficiently large values of ¢, s3(y,) is distributed like s3,(0). 

Denote by M,(0; 4; ¢o, ..., d,) the criterion calculated from normally distributed observa- 
tions and by M, (72722 --- Vex; $1 $2 --- 9) the criterion calculated from k groups of obser- 
vations drawn from populations with measures of kurtosis 72), Yoo, ---» Y2-+- Yak: Lhen for 


] 
rie ares My (21 Y 22 +++ Vox’ 91 Po ++» Px) 
is distributed as In {¥ $183.4, (0)/} — E A, In 85,4,(0). (5) 
t 


21-2 
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When the samples are sufficiently large we may write 6, for d,, and in the important case 
where the kurtosis in each of the sampled populations is the same, so that 4, = é for all ¢, 
M,(¥2; $1 $2 --- $;) is distributed as 


d-1{8® In 8§9(0) — ~ 84, 1n 854,(0)} = 8-1, (0; 361, dba, ..., bD,). (6) 


Consequently M,(y2) is distributed asymptotically not as x3_, but as (1+ 472) xj_; for any 
parent distribution having finite cumulants. We see therefore that M, is asymptotically 
biased if y, is not zero. In particular, in large samples the mean of the distribution curve of 
M, would be (1+ 4y,) (K—1) instead of k — 1 and the standard deviation (1 + 4y,) {2(k—1)}# 
instead of {2(k—1)}#. The discrepancy in means relative to the standard deviation would 
thus become larger as k was increased. Consequently divergences from normal theory 
probabilities, known to be large when k = 2, would become progressively larger for & > 2. 
Table 1 shows the true probability of exceeding the normal-theory 5 % significance levels in 
large samples for various values of k and y, when the null-hypothesis is true. 


Table 1. True percentage chance of exceeding 5 %, normal-theory significance level of M, 
in large samples from non-normal populations for various values of Y2 



































No. of groups = k 
Ye 
2 3 5 10 20 30 
2 16-6 22-4 31-5 48-9 71:8 84-9 
1 11-0 13-6 17-6 25-7 38-9 49-8 
0 5-0 5-0 5-0 5-0 5-0 5-0 
—1 056 | 0-25 0-08 0-010 0-0004 0-00001 
| 





In the case when the null-hypothesis is not true it will be seen that changes in the level of 
the criterion due to non-normality and due to real differences in the group variances will 
tend to be ‘confounded’. With leptokurtic populations the criterion will tend to show 
differences when none exist, while with platykurtic populations real differences will tend 
to be masked. 

The behaviour of the test for means is in sharp contrast to that for variances discussed 
above. The appropriate test for the hypothesis—that the means in the groups are all equal, 
given that the variances are all equal, and that the observations are normally distributed— 
is the ‘within and between groups’ analysis of variance test. In its commonest form the 
criterion used is 


F = (between groups mean square)/(within groups mean square) 

: n(%,— 9)? 2 

- (Ee |/* 
Alternatively, to correspond with M,, we could employ a form obtained from the logarithm 
of the likelihood ratio M, = (N-1)In{1 + (k-1) F/(N —B}. 


For large samples Neyman & Pearson (1931) showed that assuming normality a criterion 
equivalent to M, would be distributed as y7_,. With certain not very severe restrictions on 
the parent population it is possible to show that this result is true whether the population is 
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normal or not. Thus for large samples the criterion M, for means tends to follow the normal- 
theory distribution under conditions of non-normality. This explains to some extent the 
insensitivity to non-normality found by Pearson and other workers mentioned above. In 
contrast, the criterion M, for variances follows a distribution directly dependent on y, 
however large are the samples. 

It is perhaps unfortunate that the test for homogeneity of means, and the test for homo- 
geneity of variances in the particular case of two groups, can each be brought to the form of 
a variance ratio test. This has sometimes led to apparently contradictory statements about 
the sensitivity of the ‘ variance ratio’ to non-normality. As originally pointed out by Pearson 
in 1931, the two criteria are really essentially different in character but follow the same 
distribution when the parent is normal. The marked difference in sensitivity of the criteria 
is well brought out by an example taken from Gayen’s work (1950a). A test for homogeneity 
of means of five groups of five observations leads to an F test on 4 and 20 degrees of freedom; 
on the normal assumption this will be distributed in the same form as the F criterion used 
to compare the variances in two independent groups of 5 and 21 observations. Gayen shows, 
for example, that for a certain type of non-normal distribution in which y, = 0 and y, = 2 
the true probability of exceeding the 5% point for the test on means would be 45% 
(a quite trivial discrepancy), whilst that for the test on variances would be 10-2 %. 


3. DISTRIBUTION OF M, IN SMALL SAMPLES FOR CERTAIN 
NON-NORMAL PARENT DISTRIBUTIONS 


It will now be shown that parent distributions exist for which relations similar to (6) are 
obtained even for small samples. 


(3-1) Population means known 

For the moment it will be assumed that the means of the & sampled populations are 
known. Denote by s*,, an estimate of variance having n degrees of freedom calculated from 
the n deviations from the known group mean 7. The criterion calculated from such estimates 
is denoted by M_,. If the distribution of the observations is such that s*,(y,) is distributed 
like s?,,,(0) for any sample size then ns*,,(y,) is distributed like d-1y}, 0%, which implies that 
(y—7)? is distributed like d-1x30*. This is true if 

ply) = (8/20)#{T'($6)} | y— 7 |? texp {—d(y—7)*/20} (—co<y< +00; 0<8<o). (7) 
This is a double y distribution, that is to say, it is a symmetrical distribution of the form of 
two x distributions having é degrees of freedom ‘back to back’. It has mean 9, variance o?, 
Y¥, = 0 and y, = 2(1—4)/é (that is, d-" = 1+ }y, as previously defined). When 6 = 1 the 
term in | y—7| vanishes and the distribution is normal. When @ is less than 1, y, is greater 
than zero, and a leptokurtic distribution results. When é is greater than 1, y, is less than 
zero and a bimodal platykurtic distribution is obtained. For such parent distributions it is 
easily seen by the previous argument that 


M.1(Y23 2, Ng; ---,%) is distributed exactly as d-1M_,(0; dn,, dng, ..., dn,), (8) 
where ,, Ms, ..., 2 refer to the sample sizes. 
(3-2) Population means unknown 


In 1931 Le Roux carried out a very painstaking research into the distribution of the 
variance calculated from small samples drawn from non-normal distributions of Pearson 
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type. He studied changes in the distribution of s? as the parent distribution changed, by 
calculating its first four moments from the moments of the parent distribution, and con- 
firmed his findings by extensive sampling experiments. One of his discoveries was that in 
addition to the normal distribution a series of non-normal distributions existed (which he 
called D(s*) III distributions) for which the first four moments of s? were almost exactly 
those of a Pearson type III curve. These D(s?) III distributions show kurtosis accompanied 
by marked skewness. A selection of these adapted from Le Roux’s table are shown below. 


D(s?) III distributions 

vy? 0:00 010 020 040 060 0-80 1:00 1:20 1-40 

V2 0-00 —0-04 0-03 0-24 0-48 0-73 0-99 1-25 1-51 
For such curves Le Roux deduced that to a close approximation s3(y,) was distributed as 
844(0) for any sample size (where as before d-! = 1+ ; real V2 
his sampling experiments. We may conclude therefore that the relation (5) is approxi- 
mately true for samples of any size chosen from such populations. 

In particular, if the degree of kurtosis is the same in each of the populations sampled and 

each variance is based on the same number ¢ of degrees of freedom, then for D(s*) IIT 
populations 


MU,(y2; ¢; >, ---, 9) is distributed approximately as d-1M,(0; dd, d¢, ...,d¢@). 
Using Bartlett’s method wesee that for D(s?) III populations therefore M,(y.; 9, 2, ---, Px) 
is distributed approximately not as (1+ A) x%_, but as 
d-1(1 + Ad-) x3_,, (9) 


where A is the correcting factor of order ¢—! defined in (2). We note in passing that departure 
from normality of this form would not need to be very great or the sample size very big 
before the effect of the constant d—' completely swamped the effect of the correction term A. 


) , a conclusion supported by 


4. TWO GROUPS OF OBSERVATIONS 


When k = 2 we may use the variance ratio F as an alternative form of the M, criterion. 
Using the same argument as before and a corresponding notation it is apparent that for 
samples of any size drawn from double y distributions with measures of kurtosis y,, and yp, 


F (23; Y223 4%) is distributed exactly as F(0; 6,7, 5,7), (10) 
whilst for samples of any size from D(s*) III populations 
F (721; Y22; $1; $2) is distributed approximately as F(0; d,¢,,d.¢z). (11) 


These results, which are, of course, equivalent asymptotically, would be expected to apply 
for any population when the sample sizes were sufficiently large. To provide some indication 
of how large the samples would have to be, comparison was made with the results given by 
(11) and the more exact values calculated by other authors. Gayen (19506) and Finch (1950) 
have considered the distribution of the variance ratio or the equivalent z criterion for 
populations defined by the first few terms of Edgeworth’s series. Both these authors note 
that for these populations the effect of changes in y, is small but large discrepancies are 
produced by changes in y,. In Fig. 1, y, is assumed zero and the continuous lines represent 
the probability as determined by these authors, of exceeding the normal theory 5 % point 
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for various values of y,. The values of y, are assumed to be the same in each of the two 
populations. The first graph illustrates the case ¢, = 24, ¢, = 60 considered by Finch, and 
the second the case ¢, = 4, ¢, = 20 considered by Gayen. The dotted lines are the values 
given by equation (11) above. A further comparison is supplied by Pearson’s (1931) sampling 
experiments. Pearson drew 500 pairs of samples of 5 and 20 from six experimental popula- 
tions and calculated a criterion equivalent to F; this corresponds very nearly to the case 
¢, = 4,¢2 = 20 above (¢, is actually 19 instead of 20). In his table Pearson shows the number 
of pairs of samples for which the criterion exceeds the 4-86 % point (which again is very 
close to the 5 % point considered above). These numbers reduced to percentages are plotted 
as circles about the second line. 
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Fig. 1. Percentage probability P of exceeding normal-theory 5% level plotted against value of 7,. 


The agreement between the simple approximation (11) and the values found by Finch 
is very close, as would be expected for the comparatively large sample sizes used. The extent 
of agreement with Gayen’s results is rather surprising when the small size of one of the sam- 
ples is remembered. In both approaches it is assumed, however, that the higher cumulants 
are finite. The marked departure from both lines which occurs when this is not so is shown 
by Pearson’s sampling experiment from the type VII (i.e. the Student ¢) distribution for 
which y, = 4-1 and higher cumulants are infinite. 


5. SAMPLING EXPERIMENTS WITH A D(s?) III DistRIBUTION 
Confirmation of (9) in the case k = 20 was obtained from an experiment in which samples 
were drawn from a population nearly of type D(s*) III form. Le Roux drew 1000 samples 
of.n = 20 observations from such a distribution in which yj = 1-0 and y, = 0:8. (The dis- 
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tribution was actually a Pearson type I.) He calculated the quantity (n — 1) s?/n for each of 
the 1000 samples and published the resulting frequency distribution. The samples were 
grouped but the grouping interval was fairly fine. In the present work Le Roux’s observed 
distribution of s? has been resampled, the sampling being without replacement. In each 
experiment twenty-five groups of twenty variance estimates were reconstructed and from 
these, twenty-five values of M, were calculated. The experiment was then repeated using 
Le Roux’s observed distribution of variance estimates for samples of n = 5 from the same 
parent distribution. 


Table 2. Characteristics of observed distributions of twenty-five values of M, for k = 20 groups, 
with (a) n = 20 observations and (b) n = 5 observations, drawn from u Pearson type I 
parent distribution for which y? = 1-0, y, = 0-8 

















Expected on ‘ Expected from 
normal theory phi r? 5 equation (9) 
(x? approxi- pecs es e fr (x? approxi- 
mation) > 1 mation) 
M, distributed as: 1-018y%, 1-415y3, 
Mean 19-4 28-6 + 1-9* 26-9 
{Vacente 39-4 93-3 + 30-3t 716-1 
n = 20 
P 
: ° * 29, 
No. out of 25 [10 3 13 (52 %) 10-3 (41-2 %) 
significant 4 ° 1-25 9 (36 %) 7-5 (30-0 %) 
at P% point | } 0-25 4 (16%) 3-2 (12-8 9) 
oP 0-1 0-025 1 (4%) 0-9 (36%) 
M, distributed as: 1-088y3, 1-473y?, 
prem 20-7 28-0+ 1-8* 28-0 
Variance 44-9 79-1 + 25-6 82-4 
n=65 
P 
, i) -9 9, 
No. out of 25 {10 2-5 13 (52 %) 9-6 (38:2 %) 
significant 4 © 1-25 9 (36 %) 6-8 (27-1 %) 
at P % int | } 0-25 2 (8%) 2-8 (11-0 %) 
O-1 0-025 0 (0%) 0-7 (29%) 























* The quantity following the + signs is the approximate standard error of the mean obtained from 
s.E.(Mean M,) ={V(M,)/25}. 

+ The quantity following the + signs is a rough estimate of the standard error of the variance of M, 
obtained by assuming that M, is approximately distributed in Pearson type III form and substituting 
Y2 = 12/(k—1) in the approximate formula, s.£.{V(M,)} = V(M,) {(y_+ 2)/25}4. 


The two observed distributions of M, are shown in the sixth and seventh columns of the 
table in the Appendix. The main features of the observed distributions are shown in Table 2. 
It will be seen that the means and variances of the values of MV, are in close agreement with 
the values anticipated from equation (9) and differ very markedly from those expected on 
normal theory. The excess of ‘significant’ values is also verified. For example, both for 
samples of n = 20 and n = 5 observations, nine out of twenty-five (or 36 %) of the values 
were significant at the 5 % point. 
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6. SAMPLING EXPERIMENTS WITH OTHER DISTRIBUTIONS 


Besides the population of the D(s*) IIT type, Le Roux considered a variety of other popula- 
tions, mostly using sampling results used by E. S. Pearson in earlier investigations, to 
determine empirically the distribution of the sample variance. In the present investigation 
Le Roux’s distributions derived from the following parent distributions have been resampled 
in the manner already described: 
Pearson type II Ill I IV Vil 
(v2, Y2)* (0-0, —0-5) (0-5, 0-75) (1-0, 0-8) (1-2, 2-8) (0-0, 4-1) 

From each distribution, twenty-five sets of twenty samples of s? have been obtained, and 
for these twenty-five values of M, calculated. These observed values of M, are set out 
in the Appendix. Table 3 shows the values of the means and variances of M, observed 
and those expected on normal theory, together with the number of values exceeding the 
normal theory significance points. 

It is seen that large discrepancies from the normal theory distributions occur, and the 
discrepancies are in the directions expected. The discrepancies are larger when n = 20 than 
when n = 5, and for the populations studied the asymptotic result seems to give an upper 
limit to the discrepancy which is approached as the sample size is increased. The asymptotic 
result appears to be more rapidly approached when kurtosis is accompanied by skewness 
till for the skew D(s?) III populations, the asymptotic value is attained even for small sample 
sizes. The asymptotic result is approached more slowly when kurtosis is large and for the 
very leptokurtic type IV and type VII distributions, sample sizes greater than twenty 
appear to be needed before the asymptotic values are closely approached. 


7. M, AS A TEST FOR NORMALITY 

The seriousness of the discrepancies to be expected, even in small samples, may be appreci- 
ated by considering M, as a test for normality. 

Suppose a sample of NV observations is drawn from a population distributed in the double 
x form of equation (7). We shall assume that the mean 7 of the population is known, so that 
without loss of generality we may take it to be zero. We wish to test the hypothesis H, that 
the distribution is normal, i.e. that 6 = 1, 7 > 0, against the class of alternatives H, which 
specify that the distribution is not normal, i.e. that 6+ 1, 7 > 0. Then, following through the 
method set out by Neyman & Pearson (1933) for the selection of best critical regions, it is 
easy to show that the appropriate criterion is M_, = Nln(Zy?/N)—ZIny’, and that if 
M ,(«) and M ,(1—«) are the « and 1—« significance points for M_,, then the inequalities 

M,>M_(«%) and M,<M,(1—a) (12) 

define a pair of common best critical regions for testing H, against the alternative hypotheses 
d<1 (y,>0) and d>1 (y, <0), respectively, where in each case o is unknown. Thus in the 
particular case when there is only one observation per group, UV , is itself the most sensitive 
criterion possible for testing for a certain type of non-normality. Non-normal distributions 
of the class considered, which are defined by equation (7), would rarely be expected to 
approximate to the distribution of actual data, but the above investigation suggests that 
we may form some idea of the sensitivity to non-normality of M_, for small samples by seeing 
how good it is as a test for non-normality. 


* The actual standardized cumulants given above differ slightly in some cases from the theoretical 
values due to the finite size (10,000) of the populations which Le Roux sampled. 
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(7-1) Comparison of M_, with criteria for testing normality 
The usual test criterion for kurtosis is b,, the sample estimate of the moment coefficient 
f,. Geary (1935a) proposed as a new test for kurtosis the criterion 


a = (sample mean deviation)/(sample standard deviation). 


This criterion had the advantage that unlike b,, the distribution when the null-hypothesis 
of normality was true could be found fairly readily. Further investigation was carried out 
by Geary (19356, 1947) to discover whether the power of this test to pick out a departure 
from normality was comparable to that of b,. Pearson (1935) studied this question by means 
of sampling experiments. One such experiment consisted of drawing ten samples each of 
seventy-six observations from each of three symmetrical populations: 








Description " 73 
Rectangular 0-0 —1-2 
Pearson type VII 0-0 4:1 
Double exponential 0-0 2-9 

















* As before the actual standardized cumulants given above differ slightly from the theoretical values 
for the last two populations due to the finite size (10,000) of the population sampled. 


Pearson used his results to show the relative frequency with which the test criteria b, 
and a detected departures from normality and also to show the correlation between the 
results of the two tests. He published not only his conclusions but also the actual samples 
which he used and I have used these again to calculate the corresponding values for M ;. 
In my calculation of M_, I have used the population mean (zero in each case), whereas in 
Pearson’s calculations of b, and a the sample mean was used. In samples of seventy-six, 
however, the. discrepancy this introduces should not be large. A further difficulty arises 
due to the fact that the samples are grouped to the nearest unit, resulting in a frequency 
class having zero deviation from the mean which would give rise to an infinite value of M ;. 
To overcome this I have spaced the values evenly in the interval 0-0—0-5. Thus if there were 
m values in the zero frequency class these have been assumed to be at 1/(4n), 3/(4n), ..., 
(2n—1)/(4n). Fig. 2a shows the thirty values of M_, plotted against the corresponding 
values of b, and Fig. 26 shows the same values plotted against the values ofa. It is seen that 
in each case marked correlation occurs, and if these graphs are compared with Pearson’s 
Fig. 4 in which he plots a against 6, it will be seen that the extent of correlation between 
M , and b, and between M , and ais similar to that between a and 6,. So far as is known, no 
tables are available for the significance points of M, (or for the equivalent criterion L,) for 
the case of seventy-six groups with variance estimates each based on 1 degree of freedom. 
Also the asymptotic approximations of Bartlett (1937), Hartley (1940) and Box (1949) 
break down for large k and small numbers of degrees of freedom. Approximate significance 
points for M_, were therefore calculated by considering the distribution of Ly! = exp {M,/k} 
which ranges from 1 to oo. General expressions for the moments of LZ, were given by 
Neyman & Pearson (1931). From these the first four moments of Ly! were calculated and 
the appropriate Pearson type curve to approximate the distribution was found to be of 
type VI. It was assumed therefore that Ly; was approximately distributed as cF,, ,,, the 
curve starting at 1. By equating the first three moments of the two curves the values 
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c = 2-4705, g, = 67-1998, ¢, = 45-2904 were obtained, and the approximate significance 
points calculated by two-way, four-point harmonic interpolation in the F tables of Mer- 
rington & Thompson (1943). The approximate significance points thus found were: 





Lower Upper 





1% 5% 5% . 1% 





64-1 72-3 121-2 133-3 




















Table 4 is an extension of Pearson’s Table VII showing the number of values of b,, a and 
M _, exceeding their 5 and 1 % significance levels. 


Table 4. Samples of seventy-six ; comparison of criteria 

















ot Gan Rectangular Pearson type VII | Double exponential 
Distribution yi =0-0, y,=-12 | y2=00, y= 41 vy? = 0-0, y_, = 2-9 
bs a M, by a M bs a M, 
No. of samples: 
within 5 % limits 0 0 3 2 3 5 1 1 1 
between 5 and 1 % limits 0 2 3 4 1 2 5 1 1 
beyond 1 % limits 10 8 4 + 6 3 4 8 8 
| 
































For the rectangular and type VII distributions the criterion appears to be less sensitive 
to kurtosis than 6, or a; for the double exponential distribution, however, UM , appears to 
be as sensitive as a and more sensitive than 6,. It is not of course contended that M_, is 
necessarily a practical] test for kurtosis; we are concerned only to show that since b, and a 
are presumably very sensitive criteria for detecting kurtosis, M , is much more sensitive 
even when the sample size is small than we should desire a test for homogeneity of variances 
to be. 

In this connexion it is of some interest to reconsider some data discussed by Bartlett & 
Kendall (1946). The results appear as a two-way table with fifteen columns and three rows. 
The entries in the table are the logarithms of variance estimates s?, each based on about 
48 degrees of freedom. The authors show that by taking logarithms the data are brought to 
a suitable form for the application of analysis of variance. Their analysis is as follows: 








D.F 8.8. M.S 
Between rows 2 0-2667 0-1333 
Between columns 14 0-1047 0-0075 
Residual 28 ‘ 0-1005 0-0036 
Theoretical variance — — 0-0020 
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The first three items are calculated in the usual way, and the theoretical variance is 
based on the normal theory value x,(In s?) ~ 2/(¢— 1). Compared with this theoretical vari- 
ance the residual mean square is significant (P= 0-01). It is deduced, therefore, that there 
is heterogeneity in the residual variance, and methods for further analysing this residual 
heterogeneity are discussed. Although familiarity with the distribution of their data (of 
which clearly a large quantity was available) no doubt rendered quite legitimate in this 
example the assumption by these authors of approximate normality, it is perhaps worth 
while emphasizing the dependence of this test on that assumption. For in general 


k,(In s*) ~ 2/(d¢ — 1), 


consequently a value for y, of 1-57 would have made the residual and theoretical variances 
exactly equal. Furthermore, since any value for the theoretical variance greater than 
0-0024 would fail to show the residual variance significant at the 5 % level, any hypothesis 
that the residual variance is homogeneous and y, greater than 0-44 is not contradicted by 
the data at this level of significance. 


8. OTHER TESTS OF VARIANCE HOMOGENEITY 


The reason for the difference in sensitivity to non-normality between the criterion M, for 
testing homogeneity of means and M, for testing homogeneity of variances can be seen as 
follows. In the analysis of variance (F'-form) the criterion for means may be written as P/Q, 
where P = {n,¥,—7)*}/(k—1) is the between-groups mean square and Q = s? is the 
within-groups mean square. When the null hypothesis is true, P and Q each provides an 
unbiased estimate of o* whether the observations are drawn from normal populations or not, 
so that Q is always a standard with which P may be usefully compared. But if in M,, the 
criterion for variances, we write s? = o7(1+2,) and formally expand the logarithms, we 
obtain M, = > ¢,(x,—Z)* plus terms in higher powers of x, For sufficiently large values 


of ¢, we can ignore the higher powers of x, and we have M,/(k—1)~ p/q, where 
p = {29,(sf —s?)}/(kK—-1) and g = 2o4. 


Comparing these with the expressions above we see that asymptotically the M, test is like 
an analysis of variance on the sample variances instead of sample means, but the quantity p 
corresponding to the between-groups mean square is compared not with an estimate from the 
internal evidence of the samples but with a theoretical value of the variance which is appropriate 
only when the parent distribution is normal. In general, the variance of a variance estimate 
8? is d-! x 204/¢,; consequently the appropriate value with which to compare p would be 
d-! x 2o4 and not 20%. 

From the above discussion it will be seen that to obtain a criterion Jess dependent on y,, 
information on the variation to be expected in the sample variances gathered from internal 
evidence in the samples should be utilized. A test of variances should be ‘studentized’ for 
the fourth moment just as a test of means is studentized for the second moment. 


(8-1) Other tests not utilizing information within groups 
Other tests of variance homogeneity which do not utilize evidence on variance variability 
within the samples are equally sensitive to non-normality. In fact for D(s*) III parent 
distributions and, asymptotically, for all distributions with finite cumulants any criterion 





ma kee A @& = SS ct CO 


le el le 


Ss 2re @ 


G. E. P. Box 331 


L{85, (a1), 85, (22) oas9 85,(Yox)} will be distributed like Asi, ¢,(9), 844439): sore Sip $z(0)}- Ex- 
amples of such tests are those proposed by Cochran (1941) and by Hartley (1950). Cochran’s 
criterion is g = s2,,, /&s?, whilst that proposed by Hartley is Fag, = S%ax./S2uin,, Where 
2.x. and s2,,, are the largest and smallest of the group variances s?, s3, ..., 7, ..., 8%. Using 
the same approximation as that adopted by Hartley it is a simple matter to calculate the 
true chance of exceeding the normal theory significance points for a D(s?) III parent 
population. For instance with k groups of 21 observations drawn at random from such a 
population in which y, = 1 the approximate percentage chance of exceeding the normal 
theory 5 % significance point would be: 








k=5 k=10 k= 20 
mM, 17-5 25-2 37-3 
Fr. 17-3 23-3 30-3 























The discrepancies using F,,,, are seen to be of similar magnitude to those found with M,, 
and similar results are to be expected using Cochran’s g. The multivariate tests for the 
constancy of the variance covariance matrix from one group of observations to another 
(Wilks, 1932; Bishop, 1939; Box, 1949, 1950) would be expected to be equally dependent 
on the assumption of multivariate normality. 

Sensitivity to non-normality is found also with sequential tests. For example, Wald 
(1947) discussed the use of the sequential likelihood ratio test of the hypothesis H, that 
o? = o? when the alternative H, is that o? = o? and the observations y,, y,,... were drawn 
from normal populations with known mean 7. Suppose the observations were drawn, not 
from a normal universe, but from the double y population whose distribution is given by 
equation (7). Then it will be found that the logarithm of the likelihood ratio L(y.) is equal 
to 6L(0) (where L(0) is the logarithm of the likelihood ratio when the parent distrib tion is 
normal). Consequently if the quantity L(0) is calculated when +, is not zero and referred to 
limits In (f/(1—«a)) and In ((1—)/«), this is equivalent to referring the actual likelihood 
6L(0) to the limits é In {f/(1—)} and 6 In {(1 — £)/a}. The actual risks of error of the two kinds 
will therefore be «’ and f’ chosen so that f’/(1—«’) = {8/(1—a)}* and (1 —f’)/a’ = {(1—)/a}*. 
For 100a = 1008 = 5 the percentage risk of errors of each of the two kinds is given below 
for various values of y,: 





Vs a | 0 1 2 





100a’ = 100f’ 0-28 5-00 12-32 18-66 




















As with the tests with fixed sample sizes the result will be approximately true for any 
population with finite cumulants when the average sample sizes are not small. Roughly 
a’ =a), #’ = # for small values of a and f. Similar difficulties would be expected with 
other sequential tests relating to the true value of and equality of variances; for example, 
with the tests proposed by Girshick (1946) and the tests based on ranges suggested by 
Cox (1949). 
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(8-2) A test utilizing within-group information 

A test on variances less sensitive to parent non-normality must clearly utilize the within- 
group information in some way. Although the author believes that a better approach is 
available, one immediately practical method is to split up the groups of observations into 
sub-groups, and carry out an analysis of variance between and within groups on the 
logarithms of the sub-group variances, following Bartlett & Kendall (1946), who have shown 
the value of the logarithmic transformation in bringing variance data to a form suitable for 
the application of analysis of variance. The question of what sizes should best be taken for 
the subgroups requires further investigation. 


Table 5. Criteria calculated for ten samples of twenty groups of twenty 
observations drawn from a rectangular population 























Source of sample* Test of Tests of variances 
means 
M, on logarithms of subgroup variances 
Page Rows M, M, 
2 sub- 5 sub- 10 sub- 
groups of 10} groups of 4 | groups of 2 
I 1-20 14-5 6-7 26-2 10-3 34-0 
I 21-40 22-7 8-6 18-6 16-1 18-1 
I and II 41-50 13-8 10-8 29-6 29-3 19-3 
1-10 
II 11-30 12-6 16-7 30-7 35-0 18-1 
II 31-50 19-6 9-9 40-6 20-3 10-3 
Tit 1-20 26-9 5-2 17-9 10-8 11-6 
Til 21-40 12-5 16-1 40-8 29-6 19-8 
III and IV 41-50 18-0 7-4 34-1 21-6 34-1 
1-10 
IV 11-30 23-8 10-3 37-9 28-2 19-5 
IV 31-50 11-8 10-3 30-6 32-4 29-6 
Mean: Found 17-641-7 10-241-2 | 30-7+2-6 23-4 + 2-8 21-44+2-7 
Expected on 19-5 19-4 27-0 22-1 21-3 
normal theory 
Variance: Found 29-3 + 15-6 13-94+7-2 | 66-14 35-3 | 78-7+42-0 | 71-0+37-9 
Expected on 40-1 39-4 79-0 51-5 48-1 
normal theory 


























* The groups of observations consisted in each case of the first twenty columns of numbers from pages 
of Fisher & Yates’s tables. The page numbers and the numbers of the rows are shown above. 


The results of a small sampling experiment of some interest in this connexion are set out 
in Table 5. Ten samples of twenty groups of twenty observations were drawn from the table 
of random numbers prepared by Fisher & Yates (1938). The parent distribution was thus 
effectively rectangular. As a test of group to group homogeneity of means a ‘between and 
within groups’ analysis of variance was performed for each of the ten samples of twenty 
groups and the ten resulting values of M, calculated. These are shown in the third column 
of the table. Four different tests for homogeneity of the variances were applied to each 
sample. The first was the M, test, the results for which are shown in the fourth column of the 
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table. In the remaining three columns are shown the values of M, for analysis of variance 
performed on the logarithms of subgroup variances. The groups of twenty observations 
were divided into subgroups in three different ways: (i) two subgroups of ten observations, 
(ii) five subgroups of four, (iii) ten subgroups of two. The means and variances of the 
observed values together with their approximate standard errors and the values expected 
on normal theory are shown at the bottom of the table. 

With so few samples, only large discrepancies of course would be detectable. We see that 
the test on means shows no evidence of departure from the values expected on normal 
theory even though the parent population is rectangular. The YU, test for homogeneity of 
variance on the other hand shows extremely large departures, the mean being only about 
half the value expected on normal theory (for the rectangular parent distribution the 
asymptotic mean value is two-fifths of the normal-theory value). In contrast, it is seen that 
all the tests for homogeneity of variance based on M, give values agreeing fairly well with 
what would be expected if it could be assumed that the distribution of the logarithm of the 
variance was exactly normal, an ‘assumption far from true, particularly for subgroups with 
only two observations. 


9. Discussion 


It has frequently been suggested that a test of homogeneity of variances should be applied 
before making an analysis of variance test tor homogeneity of means in which homogeneity 
of variance is assumed. The present research suggests than when, as is usual, little is known 
of the parent distribution, this practice may well lead to more wrong conclusions than if 
the preliminary test was omitted. It has been shown (Welch, 1937; David & Johnson, 
19516; Box, 1952; and Horsnell, 1953) that in the commonly occurring case in which the 
group sizes are equal, or not very different, the analysis of variance test is affected sur- 
prisingly little by variance inequalities. Since this test is also known to be very insensitive 
to non-normality it would be best to accept the fact that it can be used safely under most 
practical conditions. To make the preliminary test on variances is rather like putting to sea 
in a rowing boat to find out whether conditions are sufficiently calm for an ocean liner to 
leave port! 

When the groups of observations were of unequal size and differences in variances might 
occur it would seem logical to replace the usual analysis of variance criterion which uses 
a pooled estimate of within-groups variance by the alternative criterion proposed by Welch 
(1951) and by James (1951), i.e. 5 w(%,—Z)*, where w, = n,/s?. This criterion is robust to 

t 


inequality of variance and almost certainly to non-normality also. (In fact, where inequality 
of variance might occur it would seem most logical to use Welch’s criterion even if the groups 
were equal.) 

When a criterion for testing a statistical hypcthesis is derived (for example, by the 
likelihood ratio method), it is usually necessary for purposes of mathematical convenience 
to over-simplify the specification of the problem. We should not be surprised therefore if 
an examination of the resulting criterion shows that the assumptions have sometimes, so 
to speak, been interpreted rather too literally. For this reason it is most important that 
derived criteria should be studied for robustness. . 

The property of robustness I believe to be even more important in practice than that the 
test should have maximum power and that the statistics employed should be fully efficient. 
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Where necessary I believe that the latter qualities should be sacrificed to ensure the former.* 
On the other hand, I do not think that we need necessarily go to the extreme of using non- 
parametric tests when it may well be that more powerful robust parametric tests can be 
found. 


I am greatly indebted to Prof. E. 8. Pearson for his interest and his many valuable 
suggestions for the improvement of the presentation of this paper. In conclusion, I wish to 
thank Mrs Margaret Edmondson for valuble assistance with the computations. 
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APPROXIMATING TO THE DISTRIBUTIONS OF MEASURES 
OF DISPERSION BY A POWER OF 3? 


By J. H. CADWELL, Ordnance Board 


1. INTRODUCTION 


The use of range in place of sample variance in various statistical tests has been the subject 
of a number of recent papers. In order to make range and means of sets of ranges amenable 
to certain lines of attack, two approximations have been proposed for the case of a normal 
parent population. The first consists in replacing range by a multiple of x with an appropriate 
number of degrees of freedom. The two parameters at choice allow the mean and variance 
of range to be matched by those of the approximating form. This method has been discussed 
by Patnaik (1950), Hartley (1950) and Florin (1950); David (1951) gives tables of the 
necessary constants. The second approach, due to Cox (1949), replaces range by a x? form. 
Pearson (1952) has compared the errors arising from the two methods. He points out that 
either.can be regarded as an approximate transformation to the x? form, and thus a number 
of standard tests are available. For instance, Bartlett’s test for homogeneity of variance or 
the short-cut tables prepared by David (1952) may be used. Similarly, the ratio of a pair of 
ranges can be replaced by an approximate F-ratio. 

Below we graduate the distribution of range by a multiple of a suitable power of y?. 
The extra parameter allows the first three moments to be matched, and much smaller errors 
result. By inverting the relationship an approximate transformation to the x? form results. 
It should be noted that, while the y approximation is dimensionally correct, the others are 
not. Thus the increased accuracy is obtained at the price of some loss of applicability of 
the approximation. 

The method gives good results up to sample size 20 and the necessary constants are 
tabled. Results for the mean of a set of m ranges are also satisfactory and constants are given 
for m from 2 to 5 and n from 2 to 10. As there are no probability integral tables available for 
this statistic, these constants may be used to supply approximate percentage points from 
those of x’. 

As the power of the statistic needed depends on sample size, the use of Bartlett’s test or 
of an F-ratio test is restricted to values for samples of the same size. In this case David’s 
5 and 1 % short-cut tables can be used. These have been transformed into equivalent tables 
for the ratio of maximum range to minimum range. Enough values have been checked by 
quadratures to ensure that these latter tables are sufficiently accurate for most purposes. 

In a paper to be published shortly, moment constants for the first quasi-range are given. 
This statistic is defined as the difference between the greatest-but-one and the least-but-one 
of the ordered values in the sample. For sample sizes beyond 17 it is more efficient than 
range as an estimator of standard deviation. It is also less influenced by the presence of an 
occasional ‘rogue’ observation, or by departures from the normal model. The same method 
of graduation gives good results and a table of constants is given. 

Mean deviation can be graduated in the same way, and transformation constants and 5 
and 1 % short-cut tables are provided. 
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Wilson & Hilferty (1931) have shown that, for large degrees of freedom, the cube root of 
x? is almost normal. This leads to an approximate normalization of statistics that can be 
graduated by a power of x”. An example is given in the section on range. 

Another transformation of some interest is the use of the logarithm of x*. This statistic 
has a variance independent of the parent value of the standard deviation, and can be used 
if itis desired to apply analysis of variance techniques to a set of observed standard deviations. 
However, while the statistic approaches normality in large samples, it does so rather slowly. 
Bartlett & Kendall (1946) suggest that the method should be used with caution for sample 
sizes below 10. We see that the logarithmic transformation when applied to range, or any 
of the other statistics considered below, will have the same properties. The variance of 
In w will still be independent of the parent standard deviation and will be approximately 


a Ee r@)| 


The errors introduced by the approximation to w will be small in comparison with the 
departures from normality of In y*. Thus conditions for the use of the logarithmic trans- 
formation of range or mean deviation will be similar to those for its use with standard 
deviation. 

It seems likely that other applications of this method of graduation may occur. Thus non- 
central x? can be well approximated by a central x on a suitably chosen number of degrees 
of freedom. Still betterresults could be obtained by the present method. The use of frequency 
curves of the form a+b(x%* (x2 on v df.) 


might also repay investigation. 
2. DETERMINATION OF CONSTANTS 


t= (3x?)*, 

where ? is on v degrees of freedom. The moments of x are readily found; they are given by 
the following relation: | _ T(ra+ 4) i 
ee le 


Using Stirling’s series, the expansions below follow at once: 


y= eae /(? Jes ene + (2) 


On solving (2) and (3) for v we obtain 
cm 2 (2V —/A,)?(V+VAs) 
= Braye a8 gD +}. 


Inversion of (2) gives the series 


am a 20lte= + dalla W808 + 200+ 0 i we 


vo --/(" 


Consider the quantity 











(4) 














338 Approximating to distributions by a power of x? 
If the values of V and f, for a particular statistic w be inserted in (4) and (5), we shall have 


saath ocaecaey u is approximately distributed like (x2/c)*, (6) 
cu’ is approximately distributed like y*. (7) 


The constant c is chosen to make the means of the two variates in (6) agree. We have 
a= 5 and loge = log2+Aflog I'(a + 4v) —log I'(4v) —logv}. (8) 


The expression for loge is readily evaluated using the log factorial tables prepared by 
Brownlee (1923). 

For small v or large «, (4) and (5) will not be sufficiently accurate. The following procedure 
was found to be satisfactory for range, quasi-range and mean deviation. 

Since (5) comes directly from (2), the use of an approximate value for v, while affecting 
the #, match of u and x does not affect the matching of mean and variance. Thus, for v > 10 
it is found sufficient to round to the nearest integer and substitute this value in the first two 
terms of (5) to determine a. 

When 10>v> 4 it is better to take v to the nearest 0-1, and three terms of (5) are needed. 
In practice it was found preferable to solve 


T'(2a + $v) T'(4v) (9) 
{T(a+ pv}? * 

Thus, for mean deviation with n = 5, two terms of (5) give a = 0-554. Using (9) twice 

we find 


V2+1= 





a=0-55, V? = 0-13655, 
a=0-56, V* = 0-14131. 


Inverse linear interpolation for the required value of V? (0-13929) gives a = 0-5558. The 
use of trial values of a with a spacing of 0-01 obviates interpolation in Brownlee’s tables 
and gives a maximum error in a of a few units in the fifth place. 

When vr < 4 the series (4) still gives very good results. Thus for range with n = 3 it gives 
2-08, the true value being 2:05. However, it is now advisable to retain the second decimal 
in v. Thus it becomes necessary to solve (9) and the relation 


D'(3a + $v) {T(4v)}? 
{T'(a + $v)}8 
This is readily done using trial values of v rounded to 0-02 to obviate interpolation in the 


log factorial tables. Thus, for range with n = 4, (4) gives v = 3-23. The values of V? and ,/f, 
at each of the four points 





VB, V3+3V24+1 = (10) 


v=3-20, a= 0-52, v=3-24, a= 0-52, 

v= 3-20, a = 0°53, v= 3:24, a = 0-53, 
are then found using (9) and (10). Double inverse linear interpolation for the required values 
of V? and ./f, then gives y= 3-202, a = 0-5257, 


With an interval of 0-04 in trial vaines of v, neglect of second differences only affects « 
slightly in the fifth decimal place in the most unfavourable case. 
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It might be desirable to use a fixed value of A for different sample sizes. This value could 
be chosen as a compromise between the values given for the various sample sizes in Table 1. 
The values of v for each sample size will be found from 


y= nyt (e-1p-F (11) 


2a2 1 

Yo= We and a= i" 
When a = } this reduces to the series developed by Florin (1950) for the x approximation. 
Two decimals in v are usually adequate, and for vy < 4 we solve (9) for v using trial values 
suggested by (11). 

It sometimes happens that all but one or two samples are of the same size; the A appro- 
priate to this size may then be used throughout. For the exceptional samples v and c are 
found from (11) and (8); for the standard size the values of Table 1 are used.* 


where 


3. RANGE 


In order to apply an approximate test of homogeneity to a set of ranges, each for the same 
sample size, we first transform to approximate x? values using the constants of Table 1. 
Then the usual Bartlett test is applied to these values. The adequacy of the approximate 
transformation to the x? form was investigated as follows. 

Tabular values of wu, as near as possible to the percentage points listed below, were trans- 
formed by (7). The approximate probability integral was then found from tables of the 
x? integral. This was compared with the exact value in tables of the probability integral 
of range prepared by Pearson & Hartley (1942). Errors in the approximate probability 
integral, expressed in units of the fourth decimal, are tabled below: 








= o. < 

er 0-5 5 20 50 80 95 99-5 
n “Se, 

3 on -2 +2 +3 ia 8 ot 

5 wif Re +5 +65 -8 ~2 0 

10 -3 -3 +5 +4 he ok 0 

15 ~2 0 +17 0 -3 0 +1 

20 -2 +1 +5 -2 _ +1 0 
































David’s (1952) 5 and 1 % short-cut tables for the maximum F-ratio, transformed by (7), 
gave Tables 3 and 4. Exact values of v, not the rounded values of Table 1, were taken. 
A four-point Lagrange formula was used when interpolating for v, and harmonic values 
of v were employed when it exceeded 10. For small v the use of variance ratio became im- 
possible, but the ratio of 8y;, tO 8max, Was quite satisfactory for interpolation. The table 
for the 1 % values was not taken beyond n = 10 because interpolation and rounding errors 
begin to affect probabilities appreciably beyond this value. 

We shall now apply a check to these values. If, in a set of k ranges, M denotes the ratio 
of maximum range to minimum range, we have 


Pr. (M >My) = 1-E[” (F(Mye)— Fle) fee) de. (12) 
0 


* All numbered Tables have been placed together at the end of the paper. 
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Using values of M, in Tables 3 and 4, the following probabilities were found by quadrature: 











% point k=2 k=6 k=11 
n=5 0-0506 0-0514 0-0526 

5% n=10 0-0508 0-0504 0-0516 
n= 20 0-0515 0-0516 0-0496 

1% n=5 0-0104 0-0106 0-0107 
n=10 0-0103 0-0102 0-0094 


























These values show a slight bias, but are satisfactory for most purposes. In view of the size 
of the error at n = 5, k = 11, the true probability was also found at the nominal 5 % level 
for n = 4, k = 11; it was 0-0513. While the effect on probability of a change of + 0-01 in the 
ratio varies over the tables, average values are — 0-0004 for the 5% tables and — 0-0002 
for the 1 % table. 

We now turn to another application of the method. The Wilson-Hilferty approximation 
uses the fact that, for large v, 

£Y a 1-2 406 (<)’ “nate 
é( ‘) = 1 op t OW ), var. oy op t Ov ). 

In addition, the distribution of this quantity is nearly normal. Applying this fact to range 
we find we is approximately Nr. {1-6443, 0-3844},* 
wi 8! is approximately Nr. {1-7163, 0-2197}. 
The errors in the approximate probability integral, found from the normal integral, are 
shown below in units of the fourth decimal: 





























| % point 0-5 5 20 50 80 95 99-5 
n=5 +20 +19 -15 -16 -10 +4 a | 
n=10 | +8 =i = 3 1 +2 0 -3 





While the approximation is good at n = 10, there are large errors when n is as small as 5. 


4. MEAN RANGE 


One method used when estimating dispersion in large samples consists in splitting the sample 
into subgroups. The mean of the subgroup ranges is then used as an estimator of standard 
deviation, Grubbs & Weaver (1947) have discussed the optimum method of splitting the 
sample. The mean of m ranges, each for the same sample size, will often arise naturally in 
other ways. Thus we might have m independent samples in which the variate has a fixed 
dispersion, but may take different mean levels in the various samples. 

Table 2 gives the constants needed to transform the mean of m ranges, each for a sample 
of n, to the x? form by means of (7). Except in one case, discussed later, no simple method 
is available for checking this approximation directly. Consequently the following procedure 
was used. 


* Nr.{z, 7} is used to denote a normal distribution, having mean=y and standard deviation=¢c. 
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While the two variates in (6) agree in their first three moments, the £, values will differ. 
Using (1) the following values were found for this £, difference. The Ar of u was the larger 
of the two throughout: 








n m=1 m=2 m=3 m=4 
2 0 —0-013 — 0-012 — 0-008 
3 — 0-005 — 0-004 — 0-002 — 0-002 
4 —0-010 — 0-005 — 0-003 — 0-002 

10 —0-015 — 0-009 — 0-006 — 0-004 























The percentage point tables prepared by Pearson & Merrington (1951) show that, for the 
region of the (£,, 8.) plane concerned, a given f, error has a greater effect for large values 
of #, than for small; its effect varies but little with f,. For the largest value of f, in the above 
table (n = 2,m = 1) the transformation is exact. The next largest, at m = n = 2, also shows 
the second largest £, difference. 

The probability density function for a single range is of simple form when n = 2, and we 
find by integration that the mean of two such ranges has the density: 


a se 
Jen° Jen” 


The probability integral of w was found by quadrature, and the previous method of checking 
used to give the following errors in units of the fourth decimal: 





% point 0-5 5 20 50 80 95 99-5 








Error -3 —5 +4 +11 -1 —4 +1 





























Since (,, 8.) approaches the normal point as m increases, this test, together with those made 
for m = 1 in the previous section, may be assumed to cover the least favourable cases. 


5. THE FIRST QUASI-RANGE 


In samples too large for the direct use of range, it may be necessary to randomize the 
observations before splitting them into groups. For instance, we may suspect that values 
are not independent of the order in which they are recorded. The lack of uniqueness may also 
be a drawback to the use of average range in some circumstances. 

In such cases the first quasi-range, defined as the difference between the greatest-but-one 
and the least-but-one of the ordered values in the sample, may prove useful. Using the 
tables of expected values of range given by Tippett (1925) and reproduced in Tables for 
Statisticians and Biometricians. Vol. 2, the mean value of this statistic can be found from 


E (gq —%q) = 26 (2q_,) = 2n(n—1) f ” FF) -2dP 
= 2n(n— vf (1—F)"-* dF — 2n(n—- nf (1—F)*"'dF 


= n&(w,_1)—(n—1)E(w,). 








342 3 Approximating to distributions by a power of x? 


Using the first three moments of quasi-range* the constants of Table 1 were found. These 
enable approximate percentage points to be deduced from those of x? by using (6). Errors 
in the approximate probability integral of quasi-range found from (7) are given below in 
units of the fourth decimal: 





























TS 

| ~gepoint 0-5 5 20 50 80 95 99-5 
n a 
10 os -2 +1 +4 0 = 0 
20 -2 0 +6 +2 ~h out +1 
30 -2 +1 +4 -2 a8 =4 +1 











In order to avoid double interpolation in the Hartley-Pearson (1950) tables of y?, values 
of v have been rounded to the nearest even integer when above 30. 


6. MEAN DEVIATION 


Work with this statistic was on the same lines as for range. Errors in the probability 
integral appear below; these are expressed in units of the fourth decimal: 


3 o/ i } 
Bo sag 0-5 5 20 50 80 95 99-5 
n 


~ 
~ 











3 —] —2 +2 + 3 —2 —2 -1 
4 -—2 —5 +7 +10 —6 —3 +1 
5 —2 —4 +8 + 3 —6 —1 +1 
6 —2 -3 +5 + 3 —3 0 +1 
10 -1 -1 +1 + 3 -1 —1 0 






































As no tables are available beyond sample size 10, £, differences in (6) were again examined. 

Series expansions for £, and £,, due to Geary, are given by Pearson in an Editorial Note 
to a paper by Godwin & Hartley (1945). Godwin (1948) gives a minor correction to the /, 
series. Fisher (1920) derived the exact distribution of mean deviation when n = 4, and 
gave the values #, = 0-297, 8, = 3-28. The series gives the values 0-299 and 3-252 respec- 
tively. 

An examination of Fisher’s paper shows that the fourth moment of o, (the mean 
deviation estimator of 7) given there should be replaced by 


1657 


(5a — 10) + 548” 


97 
Fa al = cos-!} 
518 (a = cos“! 4). 
Using this value we obtain £, = 0-2983, 2, = 3-252, in very good agreement with Geary’s 
series. 

The values of £, differences found are shown below: 





n 3 4 5 6 10 15 20 





—0-011 | —0-008 








f, difference | —0-005 | -—0-027 | -—0-017 0-004 | 0-003 























| 





* To be published shortly in a paper accepted for the Annals of Mathematical Statistics. 
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The large values at 4 and 5 are reflected in the errors found in the approximate probability 
integral. As mean deviation moves steadily towards normality with increase of n, we can 
expect errors beyond 10 to be small. 

Tables 5 and 6 give the approximate upper 5 and 1 % values of the ratio of maximum 
mean deviation to minimum mean deviation in a set of k values. Using (12) the following 
values of probability were found from these tables by quadrature: 














% point &=3 k=6 k=11 

5% n=4 0-0510 0-0520 0-0522 
n=10 0-0506 0-0506 0-0509 

1% n=4 0-0104 0-0108 0-0111 
n=10 0-0100 0-0101 0-0099 




















It will be seen that errors are similar to those for range. For sample sizes below 10 it will 
be found that the critical ratio differs very little for range, mean deviation or standard 
deviation. 

The use of rounded values of v in Table 1 obscures the fact that A tends to a limit as 
increases; this can readily be proved by using Geary’s series. This suggests that, with a 
slight loss of accuracy, a fixed value of A could be employed. It turns out that 1-8 is a satis- 
factory value, and appropriate tables are being prepared. The use of a fixed A greatly in- 
creases the flexibility of the method by removing the restriction to a fixed sample size. 


I should like to thank Mr D. F. Mills for his assistance in the preparation of the tables. 
I am indebted to Prof. Pearson for his advice on the presentation of this work, and for 
making an advance copy of the short-cut tables of critical variance ratios available. 

Acknowledgement is made to the Chief Scientist, Ministry of Supply, for permission to 
publish this paper. 
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Table 1. Transformation constants for range, mean deviation and first quasi-range 











| 
_ Range Mean deviation First quasi-range 
n n 
v A log c v A log c v A log c 

10 8-0 1-7289 0-3590 
11 9-3 | 1-7144 0-3884 
2 1-00 2 T-6990 1-00 2 0-3010 12 11 1-6683 0-4441 
3 2-05 1-9619 T-7632 2-05 1-9619 0-5766 13 12 1-6810 0-4477 
4 3-20 1-9029 1-8453 3°35 1-8437 0-7630 14 14 1-6271 0-5110 
5 4:5 1-8298 T-9393 4-6 1-7992 0-8851 15 16 1-5845 0-5648 
6 6-0 1-7489 0-0412 5-9 1-7655 0-9823 16 18 1-5497 0-6117 
7 7-7 1-6700 0-1426 7-2 1-7437 1-0612 17 20 1-5202 0-6532 
8 9-5 1-6046 0-2315 8-4 1-7394 1-1235 i8 22 1-4950 0-6908 
9 12 1-5078 0-3543 9-6 1-7355 1-1778 19 24 1-4730 0:7250 
10 14 1-4643 0-4215 11 1-7164 1-2326 20 27 1-4257 0:7857 
11 16 1-4292 0-4794 12 1-7310 1-2695 21 29 1-4102 0-8129 
12 | 19 1-3605 | 0-5748 | 13 1:7428 | 1-3036 | 22 | 32 1:3738 | 0-8629 
13 | 22 1-3075 | 0-6537 | 15 1-6918 | 1:3596 | 23 | 36 1-3236 | 0-9290 
14 | 26 1-2392 | 0-7521 | 16 1-7036 | 1-:3874 | 24 | 38 1-3153 | 0-9482 
15 30 1-1860 0-8338 17 1-7144 1-4136 25 42 1-2755 1-0036 
16 34 1-1427 0-9036 18 1-7235 1-4383 26 46 1-2413 1-0530 
17 | 40 10783 | 1-0030 | 20 1-6872 | 1-4799 | 27 | 50 1-2115 | 1-0977 
18 | 46 1-0276 | 1-0858 | 21 1-6966 | 1-5012 | 28 | 54 1-1853 | 1-1384 
19 52 0-9865 1-1569 22 1-7053 1-5215 29 58 1-1618 1-1758 
20 | 60 0-9358 | 1-2431 | 23 1-7129 | 1-5409 | 30 | 62 1-1409 | 1-2104 









































If the statistic is u, then we have: 
cuA has approximately the x* distribution on v degrees of freedom. 
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Table 2. Transformation constants for the mean of m ranges, each for a sample of size n 























m = 2 m=3 
n 
v A log ¢ v A log c 

2 2-22 1-8376 0-1644 3-46 1-7822 0-3918 
3 4-4 1-8539 0-1749 6-7 1-8282 0-3791 
4 6-7 1-8339 0-2217 10 1-8302 0-4064 
5 9-3 1-7822 0-2948 14 1-7740 0-4823 
6 12 1-7385 0-3617 18 1-7337 0-5449 
7 15 1-6849 0-4364 23 1-6633 0-6355 
8 19 1-5995 0-5434 29 1-5838 0-7372 
9 23 1-5375 0-6283 34 1-5477 0-7952 

10 28 1-4620 0-7285 42 1-4613 0-9066 

m=4 m= 65 
n 
v A log c v A log c 

2 4:7 1-7562 0-5408 6-0 1-7310 0-6572 
3 9-0 1-8152 0-5178 1] 1-8339 0-6044 
4 14 1-7816 0-5739 17 1-8054 0-6529 
5 19 1-7547 0-6257 24 1-7446 0-7330 
6 24 1-7319 0-6731 39 1-7310 0-7719 
7 30 1-6812 0-7449 38 1-6683 0-8544 
8 38 1-5959 0-8503 48 1-5870 0-9567 
9 46 1-5354 0-9336 58 1-5288 1-0381 

10 56 1-4605 1-0327 70 1-4607 1-1300 





























If the statistic is u, then we have: 
cu’ has approximately the y* distribution on yv degrees of freedom. 


Table 3. Approximate upper 5 % points of the ratio of maximum value to minimum value in 
a set of kindependent ranges. Each range is for a sample of size n from a normal population 
of fixed standard deviation 








nd 2 3 4 5 6 7 8 9 10 ll 12 
3 6-28 9-32 | 11-9 14-2 16-3 18-2 20-0 21-7 23-3 24-8 26-3 
4 3-96 5-31 6-32 7°20 7-95 8-63 9-24 9-76 | 10-3 10-7 11-1 
5 3-15 4-02 4-63 5-10 5-53 5-93 6-26 6-55 6-80 7-05 7-28 
6 2-74 3-37 3-82 4-16 4-47 4-71 4-93 5-14 5-32 5-50 5-66 
7 2-49 2-99 3-34 3-61 3-25 4-04 4-22 4-37 4-51 4-63 4-75 
8 2-32 2-75 3-04 3-27 3-46 3-62 3°75 3-88 3-99 4-09 4-18 
9 2-20 2-58 2-83 3-03 3-19 3-32 3-44 3-55 3-64 3-73 3°81 

10 2-11 2-45 2-68 2-84 2-99 3-11 3-21 3-31 3-39 3-46 3°53 

12 1-97 2-26 2-46 2-60 2-72 2-82 2-90 2-98 3-05 3-11 3-16 

15 1-85 2-09 2-25 2-37 2-46 2-54 2-61 2-67 2-73 2-78 2-83 

20 1-72 1-92 2-05 2-14 2-21 2-28 2-34 2-38 2-42 2-46 2-50 


















































346 Approximating to distributions by a power of x” 


Table 4. Approximate upper 1 % points for the range ratio 








° 2 3 4 5 6 7 8 9 10 ll 12 
nn» 
3 | 141 21-0 26-7 31-8 36-4 40-7 44-8 48-5 52-1 55-4 58-7 
4 6-91 9-28 | 11-0 12-4 13-7 14-8 15-8 16-7 17-6 18-5 19-1 
5 4-87 6-16 7-06 7-76 8-44 9-04 9-51 9-93 | 10-5 10-8 11-2 
6 3-96 4-80 5-40 5-86 6-30 6-66 6-99 7:25 7-51 7-76 7:96 
7 3-44 4-08 4-52 4-86 5-16 5-43 5-66 5-86 6-06 6-26 6-46 
8 3-11 3-63 3-98 4:27 4-50 4-71 4-88 5-05 5-18 5-31 5-44 
9 2-88 3°33 3-62 3-86 4-05 4-20 4-34 4-47 4-60 4-71 4-81 
10 2-71 3-10 3°35 3-56 3-73 3-87 4-00 4-10 4-20 4-29 4-38 












































Table 5. Approximate upper 5 % points for the ratio of maximum mean deviation to minimum 
mean deviation in a set of k independent values. Each value is derived from a sample of 
n from a normal population of fixed standard deviation 












































Be 2 3 4 5 6 7 8 9 10 11 12 
3 6-28 9-32 | 11-9 14-2 16-3 18-2 20-0 21-7 23-3 24-8 26-3 
4 3-96 5-32 6-31 7-20 7-95 8-64 9-22 9-74 | 10:3 10-8 11-2 
5 3°17 4-04 4-64 5-13 5°55 5-96 6-30 6-59 6-85 7-09 7-31 
6 2-74 3°37 3°81 4:16 4-47 4-70 4-93 5-14 5-32 5-49 5-65 
7 2-48 2-98 3°33 3-60 3-82 4-02 4-19 4-34 4-48 4-62 4-74 
8 2-30 2-72 3-01 3-23 3-42 3-57 3-72 3-84 3-95 4-04 4-13 
9 2-17 2-54 2-78 2-97 3-13 3-26 3-37 3-48 3°57 3-65 3-73 

10 2-07 2-39 2-61 2-77 2-91 3-02 3-12 3-21 3-29 3°36 3-43 
12 1-92 2-19 2-36 2-50 2-60 2-70 2-77 2-84 2-91 2-96 3-01 
15 1-77 1-99 2-13 2-24 2-32 2-39 2-45 2-51 2-56 2-60 2-64 
20 1-63 1-80 1-91 1-99 2-05 2-10 2-15 2-19 2-22 2-26 2-28 
30 1-49 1-61 1-68 1-74 1-78 1-82 1-85 1-88 1-91 1-93 1-94 
60 1-31 1-38 1-43 1-46 1-48 1-50 1-52 1-54 1-55 1-56 1-57 
fo 9) 1-00 1-00 1-00 1-00 1-00 1-00 1-00 1-00 1-00 1-00 | 1-00 








Table 6. Approximate upper 1 % points for the mean deviation ratio 








N 2 3 4 5 6 | 8 9 10 11 12 
3 | 141 21-0 26-7 31-8 36-4 40-7 44:8 48-5 52-1 55-4 58-7 
4 6-92 9-30 | 11-0 12-3 13-6 14-8 15-9 16-8 17-6 18-4 19-0 
5 4-89 6-18 7-12 7-83 8-49 9-04 9-50 10-0 10-5 10-9 11-2 
6 3-96 4-79 5-41 5-86 6-30 6-66 6-98 7:26 7-52 7:75 7:96 
7 3-43 4-07 4-51 4:85 | 5-15 5-41 5-64 | 5-85 6-06 6-26 6-44 
8 3-08 3-60 3-95 4-22 4-45 4-67 4-84 5-00 5°15 5-28 5-40 
9 2-83 3-27 3-56 3-80 3-98 4-15 4-29 4-42 4-53 4-63 4-72 
10 2-65 3-02 3-27 3-47 3-64 3°77 3-89 3-99 4-09 4-18 4-26 
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THE POWER FUNCTION OF SOME TESTS BASED ON RANGE 
By H. A. DAVID 


1. INTRODUCTION 


In a number of recent papers (Patnaik, 1950; Hartley, 1950a; David, 1951) tests based on 
the sample range have been developed to deal with the analysis of simple and double 
classifications as well as of some more complex orthogonal designs. It is our purpose to 
compare the power function of these short-cut tests with that of the analysis of variance 
procedure usually employed. 

The new test criteria which may be used in place of the F-ratio are of two forms, depending 
on whether the test is one for a main factor or an interaction. In the former case the test 
function is the ratio of a single range to an independent mean range estimator of the popula- 
tion standard deviation a, all ranges being taken over a small number of variates. In this 
situation a mean range can be closely approximated by a y-distribution (with fractional 
degrees of freedom v) so that the test ratio may be referred to tables of the ‘Studentized’ 
range qg. The same x-approximation allows us also to deal with the ratio of two independent 
mean ranges which arises in a test for an interaction. 

The power of these tests has not been determined for any of the above designs, except in 
special cases. However, there has been evidence suggesting that the tests are likely to be 
reasonably powerful. Thus: (a) It is known that for a small sample size n the range provides 
a fairly efficient estimator of o and that for large n (n> 12) a very good estimator of ¢ is 
given by the mean of ranges taken over small subgroups of the original sample (Davies & 
Pearson, 1934; Grubbs & Weaver, 1947). (6) For all of the above classifications tables are 
available giving the ‘equivalent degrees of freedom’ v of estimates of error found from mean 
ranges (see, for example, David, 1951, pp. 408-9). v is generally not much smaller than the 
number of error degrees of freedom in the analysis of variance. (c) Extensive investigations 
by quadrature methods made by Lord (1950) show the power of his u-test (Lord, 1947), 
employing a range estimate of variance, to be little less (on the systematic model) than that 
of the corresponding f-test, especially when mean ranges are employed. 

These considerations suggest that the use of a mean range instead of a root mean square 
as a ‘Studentizing’ statistic leads to a definite but small loss in power. This will be confirmed 
in the sequel. It will also be shown that if a random probability model is relevant, range 
methods are somewhat less powerful than the standard procedure. For the systematic 
case attention is confined to a number of special alternate hypotheses. The q-test, whilst 
always retaining the property of unbiasedness (in the Neyman-Pearson sense) of the F-test, 
is found in certain rather special but important practical situations to be more powerful 
than the F-test. 


2. THE RANDOM MODEL 
We consider first the random set-up 
ty=K+utea (¢=1,...,1;j =1,...,m), (1) 


where K is a constant and the u,, z,; are all mutually independent normal deviates with 
variances o’2 (for w) and o? (for z). The presence of the u-terms in (1) makes the varia:.ce of 
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the means %, equal to (o? + mo’)/m instead of o?/m; hence the usual ratio of the between- 
groups mean square to the within-groups mean square is distributed not as F but as 
(1+m£?) F, where € = o’/o, the degrees of freedom of F being v, = 1—1, v, = (im —1). In 
this case the single-tailed F-test is known to be the uniformly most powerful test of the null 
hypothesis that £ = 0 against alternatives for which ¢> 0. 

When range methods are employed, we test the null hypothesis by referring the ratio 


a ./m range (2;) (2) 


- Ble 


to tables of percentage points of the ‘Studentized’ range q (Pearson & Hartley, 1943; 
May, 1952; Hartley, 1953). Since the presence of the u-terms multiplies the standard devia- 
tion of ./mZ; by the factor ,/(1 + m€?), q,, will be distributed approximately as ,/(1 + m€?) q, 
where gq is the ‘Studentized’ range for sample size / and degrees of freedom v. 


Table 1. Power of the q-test for a simple classification (random model); 
power of corresponding F-test = 1—£ 






































la. 2=0-05, 1—£ = 0-90 1b. a=0-01, 1—£ = 0-90 
| 
” 3 6 ea “va 3 6 | @ 
1 l 
4 0-90 0-90 0-90 4 0-89 0-89 0-89 
6 0-86 0-88 0-89 6 0-87 0-88 0-88 
8 0-85 0-87 0-87 8 0-85 0-86 0-87 
10 0-83 0-86 0-86 10 0-83 0-85 0-85 
le. a=0-05, 1-2 = 0-75 ld. a=0-01, 1—f = 0-75 
: ft 3 6 © ee 3 6 oo 
4 0-74 0-74 0-74 4 0-73 0-74 0-74 
6 0-68 0-72 0-73 6 0-70 0-71 0-72 
8 0-67 0-70 0-71 & 0-68 0-69 0-69 
10 0-65 0-69 0-69 10 0-64 0-67 0-67 



































As pointed out by Patnaik (1950) the power function of the g-test can be obtained from 
tables of the probability integral of g and may be compared with the power of the corre- 
sponding F-test above. We do this by employing a general method for the comparison of 
the power function of two different tests of the same hypothesis (see Hartley, 19506). For 
an assigned significance level « we find the value of ¢ for which the power of the F-test has 
a selected value 1—£; we then find the power of the q-test for the same value of ¢. If 
F(a; v1, V2) is the upper 100 % point of F for degrees of freedom 1,, V2, it follows that the 


value of € is given by 1 +m? = F(a; v4, v_)/F(1—B; 45%). 


Hence, using a similar notation for probability points of g, the power of the q-test for the 
same value of € is 


Pr. [y(1 + m&2)q > q(a; 1, »)] = Pr. (g > q(a; 1, v) V{F(1— 5 v4, )/F (a; %4,¥4)}]. 
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Results for the cases « = 0-05, 0-01; # = 0-10, 0-25 and selected values of / and m are 
given in Table 1. Thus from Table 1a we see, for example, that in testing for the significance 
of differences in means in eight categories, each containing six observations (/ = 8, m = 6), 
a power of 0-90 given by the standard test is reduced to 0-87 when the present q-test is 
employed. It is evident from the table as a whole that the loss of power is small, especially 
in the commonest situations where the number of groups / is not very large nor the number 
within groups m very small. The effect of / on the power is clearly more marked than that 
of m. As moo the problem reduces to that of comparing the range of the group means with 
the between-groups mean square as criteria for detecting departure of the between-groups 
population variance from oa’? = 0. 


2-1. A modified form of the test 
It is of some interest to determine the loss in power when only the denominator of the 
F-ratio is replaced by a range estimator (since presumably most of the reduction in power is 
due to the use of a single range in the numerator of q). The resulting test criterion is 
m2(z,—2)*/(B/c)?, 
which is distributed approximately as F with degrees of freedom /—1,v. Using a signi- 
ficance level a, its power to detect a o’? which the ordinary F-ratio test returns as significant 
with power 1 — #, is given approximately by the Incomplete Beta-ratio 
L(4l(m — 1), $(2—1)) = (44, $2) 
where t= v_/(ve+v,F’), F’ = F(a; v4, v) F(L—B; v4, ve)/F (a; vy, V2). 
We have confined ourselves to the case m = 3 for which the drop in power due to the use of 


the ‘Studentized’ range criterion is most substantial. (To two-decimal accuracy the results 
are the same for / = 6, 8, 10.) : 


(3) 

















l=4 l= 6, 8, 10 
a a 
0:05 0-01 0-05 0-01 
rm pee 
~s : | es 
| 0-10 0-90 0-90 0-10 0-89 0-89 
0-25 0-74 0-74 | 0:25 0:74 0-73 
| 




















As expected, the loss in power is small, and there may be circumstances in which the present 
hybrid test criterion is of practical value. 
2-2. Test for a double classification 
Very similar results to those in Table 1 are obtained for the double classification 

yy = K+uztojt+2y; (t= L, .-:,83 § @ hah 
These are set out in Table 2, which is relevant to the power of the ‘Studentized’ range test 
for the existence of the uw, (the power of tests for the v; follows by symmetry). m has been 
taken as 4, 7, 00 so as to give F-ratios with the same degrees of freedom as corresponding 
entries in Table 1 with m = 3, 6,00. 

2-3. Power of modified F-ratio test for interaction 
W,/c 
W,/c 
of range methods. F, is distributed approximately as an F-ratio with modified degrees of 


Biometrika 40 23 





2 
In testing for an interaction a criterion of the form F, = ( ) arises through the use 
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freedom vj, v; which, as well as ¢,, c,, are obtainable from appropriate tables given by 
David (1951). It is clear that a power function comparison of the F,- and F-tests may be 
made precisely as in (3). We have taken two cases typical of a split-plot experiment, but 


Table 2. Power of the q-test for a double classification into 
l treatments and m blocks (random model) 





























2a. a=0-05, 1—f = 06-90 2b. 2=0-01, 1-8 = 0-90 
™m m 
7 fo) 
be 4 7 fo) 4 4 
4 0-90 0-89 0-90 4 0-90 0-89 0-89 
6 0-88 0-89 0-89 6 0-89 0-88 0-88 
| 8 | 0-87 0-87 0-87 8 0-87 0-87 0-87 
2c. a=0:05, 1—f = 0-75 2d. a=0-01, 1—-f=0-75 
“in 4 7 Pe) “aa 4 | 7 0 
l l | 
4 0-75 0-4 0-74 4 0-75 0-74 0-74 
6 0-71 0-72 0-73 6 0-73 0-71 0-72 
8 0-71 0-71 0-71 8 0-71 0-70 0-69 
| 





























similar results will hold in general. We consider the comparison of ‘error (1)’ against 
‘error (2)’, which is often carried out as a preliminary test of significance: 














t=5, m=4, n=3 i=3, m=3, n=5 
a a 
0-01 0-05 0-01 0-05 
0-10 0-87 0-88 0-10 0:87 0-87 
0-25 0-71 0-72 | 0°25 0:72 0:73 
| 























3. THE SYSTEMATIC SET-UP 
We confine ourselves to the simple classification 
t= K+A,+2; (t= 1,...,1; 7 = 1,...,m), 
where the systematic effects A, are subject to the condition ZA; = 0. Similar results will 


be seen to hold for other models. 
The case of two groups (/ = 2) need not be considered here as the test ratio (2) is simply 
Jm | Z,—%, |/(w/c), 
and is equivalent to Lord’s u-criterion appropriate to a two-sided test for the significance 
of the difference of two means (see § 1). 
For /> 2 no simple procedure for the evaluation of the power of the g-test seems possible. 
We shall proceed by illustrating that, as in § 2-1, the loss in power due to the mean-range 


denominator alone is small, and by then investigating the more important changes in power 
introduced by the single-range numerator. 











Y 
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From Tang’s (1938) tables it is easy to obtain the reduction in power when the degrees 
of freedom in the denominator of the F-ratio are reduced from 1(m — 1) to v, to give what we 
may term the F,-test. We give two examples in the manner of Tang’s table with 

$ = y(mZA3/l). 
Example. (i) 1 = 6, m = 6; (ii) / = 8, m = 3. 














Values of 1—£ 
a=0-05 a=0-01 
(i) (ii) (i) (ii) 
? ¢ 
F Fp F Fp F Fp F F, 
1-5 0-738 0-728 0-711 0-696 2-0 0-824 0-810 0-762 0-736 
2-0 0-952 0-948 0-942 0-937 2-5 0-974 0-969 0-952 0-941 









































3-1. The power of the range test 
A consideration of the numerator of qg alone is clearly equivalent to an investigation of an 
ordinary range test, i.e. one for which o is known. (2) now reduces to 


/mrange (Z,)/o, 


a non-central range if the expectations of the Z; differ. We shall write, using a dash to denote 


non-centrality, w' = range (y;) 
vs? 


where the y; follow normal distributions with expectations a; (where La; = 0) and unit 
variances. 
Writing z(x) = e—!#*/,/(27), the probability integral of w’ is given by 


i oe) I 
Row’) = Z [ated TE (["” 2tan— an) avn) dy (4) 
i=1J—@ h+i \J vi 


The power (1 — f) of the range test at significance level ~, namely, 1 — P,(w,), is a symmetric 
function of the / expectations «,. 

In view of the constraint X«; = 0, the power is also a symmetric function of any /—1 of 
the «; but does not appear to be expressible (unlike the corresponding probability integral 
of non-central x? which depends only on Xa?) as a function of less than these /— 1 independent 
parameters (compare § 4). 

We therefore confine attention to a few important special cases for which the evaluation 
of P(w,) by approximate quadrature is practicable. 

(a) The case of a single outlying mean. For this model we may write 


a,=—pll (j =1,...,J-1), a = (l—1)p/l. 


From (4) we have after a simple translation 
Paw) = (=f aty([™eeae) “([™ 1-H dn) dx eet 
+{ 2(a — pt) ({" x(€) df) dx. 


-@ 


vitw’ 


The second term is very small, as it represents the contribution to P(w,) when the outlier 
gives rise to the smallest observation. The power of the corresponding y?-test was obtained 


23-2 
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from the tables of non-central x? given by Patnaik (1949) and Fix (1949) and from the 
v = 00 line of Tang’s tables, with A = La? = (J—1)y2/l and ¢ = /(A/l). In a few cases 


Power function of some tests based on range 


Patnaik’s y?-approximation to y’? had to be used. 


Values of the power (1 — f) 


Table 3. Comparison of x?- and w-tests for a single outlying mean. 

















2 6 10 
l 
a 
g x*, w x’ w x w 
0-05 3 0-563 0-53 0-56 — _ 
4 0-807 0-82 0-83 0-76 0-80 
5 0-942 0-96 0-97 0-94 0-96 
6 0-989 — — 0-994 0-996 
0-01 4 0-600 C-62 0-63 0-54 0-61 
5 0-831 0-88 0-88 0-82 0-87 
6 0-952 0-98 0-98 0-97 0-98 





























It will be seen that except for / = 2, when the two tests are equivalent, the range test is 
slightly more powerful than the x?-test. This is not surprising in a test for outliers and a 
similar conclusion holds when, for example, there are two outliers, one at each end.* 

(6) The case of two equal clusters. For simplicity take / even, then this set-up is defined by 

a=—tu (t= 1,..., 4), 


= tu (¢ = $1+1,..., 2), 
and P(w,) given by 


Riw,) = Hf” 2(2)( [P""=e ae) vi) ( | a aly ~1)dq)" de 


taf” ete-m({™ 2eae)" ([ ey— yan)" ae 


A procedure similar to that in (a) yields the following comparisons: 

















a= 0-05 
~~ l 2 6 10 
\ 
\ 
¢ Wy x4, w ¥ w x w 
3 0-563 0-823 0-74 0-94 0-82 
4 0-807 0-981 0-95 0-999 0-98 























For / = 2, this model is identical with (a). In other cases the range method is less powerful. 


* Results reached by Dixon (1950) are of interest in this context. He plots for various tests for 
outliers, including the above w- and x?-tests, ‘performance’ curves obtained by sampling methods and 
finds the range test superior for all ~. However, his ‘performance’ is not measured in quite the same 
way as our ‘power’, as he discards all samples for which 2, is not the largest value in the sample. 





the 
88 
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4. SoME GENERAL COMMENTS 


We now consider to what extent the conclusions regarding the range test of § 3-1 can be 
converted into conclusions about ‘Studentized’ range tests. Case (a) is of interest in sug- 
gesting that when the models there discussed are relevant a better test than the standard 
one of the analysis of variance can be obtained by replacing the F-ratio by 


donrange le 
where s=,/ {ZX (2x,;—%,)?/{m — 1)}, 


giving a ‘Studentized’ range test with root-mean-square denominator. The test criterion 
range (%,)/(w/c) may also lead to a more powerful test, especially when the number of 
degrees of freedom in the numerator is large, but the issue is somewhat obscured by the small 
loss in power due to the use of #/c in place of s. This does not, of course, contradict certain 
optimum properties of the analysis of variance tests proved by Hsu (1941) and Wald (1942). 
However, it shows that the non-central range distribution does not depend on Xa? alone, 
as otherwise Hsu’s result would not permit it to lead to a more powerful test than the 
F-ratio. 

These comments provide no rigorous conclusions. It would be possible, following lines 
similar to those of Hartley (1948), to obtain exact upper bounds for the difference between 
the power functions of the F- and q-tests in terms of the maximum difference between the 
power functions of the x and range tests. But as the distribution of non-central range is not 
available, this approach cannot at present lead us to any numerical gauge of accuracy. 


My thanks are due to Dr H. O. Hartley for his guidance in this work which was carried 
out at University College, London. 
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SOME SIMPLE APPROXIMATE TESTS FOR POISSON VARIATES 


By D. R. COX 
Statistical Laboratory, University of Cambridge 


1. INTRODUCTION 


If events, such as accidents or stoppages of a machine, occur randomly in time at a true 
rate A, the number of events in a fixed time ¢ follows a Poisson distribution of mean At. 
In inverse sampling, with n fixed, the time ¢ up to the nth event is distributed as (2A)-1 3, 
where y3, denotes a yx? variate with 2n degrees of freedom. Barnard (1946) has pointed out 
that the latter result leads to convenient ‘sequential’ tests of hypotheses about A. For 
example, if we have inverse samples (7,, ¢,) and (m9, t,) from two populations, the hypothesis 
that A, = A, is tested by referring F = t,n,/t,n, to the variance-ratio tables with (2n,, 2n,) 
degrees of freedom. In the present note we show that, with a slight modification, Barnard’s 
tests apply almost exactly to direct Poisson sampling in which ¢ is fixed and n is a random 
variable. We obtain, in particular, a convenient test for the equality of Poisson means, 
although the main use of the method is likely to be in more complicated situations where 
A is the product of unknown parameters. 


2. THE BASIC APPROXIMATION 


In direct Poisson sampling in which the number of events occurring in a fixed time is 
recorded, we have 


© e—At r 
prob (no. of events >) = > i = prob (55 x3, < ) ; (1) 
Pa 2A 
and prob (no. of events >” +1) = prob (55 Xan+e < ) ; (2) 


If we wish to make an approximation to prob (no. of events >”) in which the number of 
events is treated as a continuous variate, it is reasonable to take a quantity intermediate 
between (1) and (2). A natural choice is 
1 
prob (no. of events >”) ~ prob (5 xX3n41 < ) ; 
i.e. we calculate probabilities as if 


2At is distributed as x3... (3) 


Thus the suggestion is that if we observe n events in a fixed time t, we may work approxi- 
mately as if n is fixed and 2At is distributed as x3,,,,. The remainder of the paper is con- 
cerned with the consequences of (3). 


3. COMPARISON OF TWO POPULATIONS 
If we sample two populations with rates of occurrence A,, A, and in times ¢,, t, observe 
Ny, N, events, then, according to (3), 
ty (Me + +) A ( 4) 
ta(m, +4) Aq . 








wa 


(3) 


>xi- 


rve 


(4) 
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is distributed approximately as F with (2n, + 1, 2n,+ 1) degrees of freedom. Thus we may 
test the hypothesis that A, = A, by referring 


ty (2 +4) 
Os es TL f 5 
to(, + 4) ©) 
to the F tables with (2n,+1,2n,+1) degrees of freedom. Also (100—2a) % confidence 
intervals for A,/A, may be obtained from 


to(m + 4) A, _ t2(m, + 4) (6) 
t(mg+$) ~ Ag t(m_.+4) * 
where F_, F, are the lower and upper a % points of F with (2n,+1,2n,+1) degrees of 
freedom. é 
Example. Observations of the spinning of one batch of wool gave 5 ends down in 200 
spindle hours and of a second batch 12 ends down in 180 spindle hours. 
Assuming that the occurrence of ends down is random for each batch, we may test for 
the difference between batches by 
_ 200 12-5 
= 180 5-5 
with (11, 25) degrees of freedom. The 5 % point in the ordinary F tables is 2-20, the 24 % 
point 2-56. Thus in a two-sided test the difference between batches is very nearly significant 
at 5%. The lower and upper 24 % points of F with (11,25) degrees of freedom are 1/3-16 
and 2-56, so that by (6) a 95 % confidence interval for the ratio of the stoppage rates is 


180 x 5°5 | 1 it 180 x 5-5 | 
200 x 12-5 3-16 “A, 200x 12-5 
i.e. 0-125 <A,/A, < 1-01. 


The test of the difference between batches could equally well be done by the conventional 
x? method, but the provision of a simple confidence interval for the ratio of the stoppage 
rates is a useful additional feature of the present approach.* 








= 2-53, 


2-56, 


4. ACCURACY OF THE TEST 


The test (5) may be expected to be accurate for large samples; its accuracy for small samples 
may be investigated as follows. The distribution of n,, n, corresponding to given A,, Ag, 


t, and ¢, is exactl 5 
1 2 y (A; t,)” em Acts (A, t.)™ 4 (7) 
n,! ns! 


prob (n,,%_) = e~*14 
If we first determine the critical region of the test (5), we can then find the exact probability 
associated with the test by adding (7) over all points in the critical region. Table 1 gives the 
results of such calculations. In all cases a one-sided test of the hypothesis A, = A, against 
alternatives A, > A, has been examined. 

The general conclusion from Table 1 is that except when the population means are very 
small, the approximate F test gives the probability of errors of the first kind sufficiently 
accurately for practical purposes. For samples of the same size, the test may be considered 
satisfactory at the 0-05 level if the true mean exceeds one, and satisfactory at the 0-01 level 
if the true mean exceeds two. 


* Note added in proof. The derivation of confidence intervals for the ratio of Poisson means has 
recently been considered in detail by Chapman (1952). 














356 Simple approximate tests for Poisson variates 


It is natural to compare the approximate F test with the exact test of Przyborowski & 
Wilenski (1940), which is based on the distribution of n,, n, conditional on n,+n,. The 
comparison is difficult because the latter test is discrete and the true size of the critical 
region at, say, the 5 % level is appreciably less than 5 %, when small numbers are involved. 
The critical region of the approximate F test appears always to consist of the critical region 
of the exact test with some additional points. It is thus roughly equivalent to Barnard’s 
c.S.M. test (Barnard, 1947). 


Table 1. Exact probabilities associated with various nominal significance levels of the 
approximate F test of the hypothesis A, = A, against alternatives A, >A, 


(a) Samples of equal size, t, = t, 











Population Nominal significance level 
mean, 
Ayt, > ae Aste 0-1 0-05 0-01 0-001 
1 naan 0-031 0-001 aa 
1-5 _ 0-049 0-004 bi! 
2 0-124 0-059 0-007 0-0001 
3 — 0-065 0-011 — 
4 _ 0-062 0-012 wiki 
5 —_ 0-058 0-012 ats 
6 0-102 0-054 0-012 Pe ae 























(6) First sample size the larger, t, = 3t, 




















Smaller Nominal significance level 

population 

mean, At, oie ais 
t 0-048 0-008 
15 0-055 0-009 
y 0-056 0-010 
8 0-056 0-011 
4 0-056 0-010 
5 0-051 0-010 








(c) Second sample size the larger, tz = 3t, 




















Suiitien Nominal significance level 

oy 

racial 0-05 0-01 
1 0-012 3x 10-5 
15 0-038 5x 10-* 
2 0-056 0-002 
3 0-059 0-010 
4 0-059 0-013 
5 0-057 0-012 
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The approximate F test is an interesting consequence of (3) and may sometimes be pre- 
ferred to conventional approximate x? methods. The detailed discussion does, moreover, 
show that (3) may lead to accurate results even in very small samples; in §7 we shall consider 
an application where a great simplification is achieved by the use of (3). 


5. CONFIDENCE INTERVALS FOR A SINGLE MEAN 


The approximation (3) may be used to test the hypothesis that A = Ao, or to obtain con- 
fidence intervals for A. An interval of confidence coefficient (100—2a)% is thus given 


imately b 
approximately by $X3n41,- <Al< bX3n41,4 ” 


where x? , are the upper and lower « % points of x? with v degrees of freedom. This may 
be compared with Garwood’s (1936) confidence interval, which in the present notation is 


3X3n,- <At< $X3n+2,+° (9) 


The interval (9) has a confidence coefficient of at least (100 — 2a) 94, whatever the true mean 
At; the true confidence coefficient is a serrated function of At which, for small At, has most of 
its values appreciably greater than (100— 2a) %. 

The interval (8) is always narrower than (9), and the true confidence coefficient is some- 
times less than (100—2a)%. In many practical applications it would be reasonable to 
assume that over a number of applications of the method the true means At are distributed 
randomly with respect to the serrations of the graph of the confidence coefficient. In this 
case it would be justifiable to use a confidence interval for which the average confidence 
coefficient over any fairly small range of At is at least (100 — 2a) %. The interval (8) appears 
to satisfy this condition provided that n and a are not very small, and it might therefore be 
claimed that (8) is preferable to (9). The point is, however, of little practical importance 
because the reduction in width from (9) to (8) is only small. 

Example. Seven stoppages of a machine are observed in a certain period. 90% con- 
fidence intervals for the population number of stoppages are, according to (8), (3-63, 12-50) 
and according to (9), (3-29, 13-15). 


6. RELATION WITH INVERSE SAMPLING 


The close connexion between the tests given in §§ 3 and 5 and those based on inverse sampling 
has been mentioned briefly in §§1 and 2. This connexion will now be discussed in more 
detail.* 

Tests based on the measurement of the intervals between randomly occurring events have 
been described in full by Maguire, Pearson & Wynn (1952). In particular, their test for the 
significance of the difference between the rates of occurrence in two series is as follows. Let 
24, Ng be fixed and let t,, ¢, be random variables defined as the times in the two series up to 
the ,, ngth events. Then F = t,n,/t.n, may be tested exactly as a variance ratio with 
(2n,, 2n,) degrees of freedom. The test of § 3, on the other hand, is that if ¢,, t, are fixed times 
and if random variables n,, n, are defined as the numbers of events occurring in times ¢,, ty, 
then F = t,(n,+4)/t.(n,+4) may be tested approximately as a variance ratio with 
(2n, + 1, 2ng+1) degrees of freedom. 


* I am very grateful to Prof. E. 8. Pearson for some helpful comments leading to the addition of 
this section. 
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In the first case the interval ¢, ends with the n,th event, but in the second case this is not 
in general so. The extra degrees of freedom, and the }’s in the definition of F, may be thought 
of as accounting for the periods between the last events and the close of the periods of 
observation. In many applications in which ¢,, t, are fixed, the precise instants at which the 
M4, Nth events occur would not be known. 

It will frequently happen in applications that neither ¢,, ¢, nor ,, n, are fixed in advance 
of the observations. If, however, the periods ¢,, t, are determined by some random process 
that is quite uninfluenced by the numbers of events occurring, then conditionally on ¢,, t,, 
the quantities n,, n, follow Poisson distributions and the approximate F test may be 
applied. Similarly, if the time intervals are measured up to the n,, nth events, where n,, n, 
are determined by a random process independent of the observations, then the exact F 
test with (2n,, 2n,) degrees of freedom is applicabl. 

There are many other possible ways in which ¢,, t, might be determined; for example, 
t,, tg might be chosen by some random process correlated with, but not completely deter- 
mined by 7,, m2. In such cases it is not possible to find the properties of the above tests 
without special investigation, although a reasonable general rule is to use the first F test, 
with (2n,, 2n,.) degrees of freedom, whenever the intervals ¢,, t, are ended by events, and the 
second test otherwise. This rule breaks down in extreme cases; for example, if t,, ¢, are deter- 
mined by a likelihood-ratio sequential test for comparing the rates of occurrence, it would 
clearly be entirely inappropriate to apply either F test. 


7, THE LOGARITHMIC TRANSFORMATION 


The basic approximation (3) is very conveniently expressed in logarithmic form, and so 
may be expected to be useful whenever A is the product of unknown parameters. If, is any 
function of n we may rewrite (3) 


logl® = log A+ logf, —log 48,41. 


The log x? distribution has been studied in detail by Bartlett & Kendall (1946) and by 
Wishart (1947), who have, in particular, shown that 


Blog tx5 = y(4v), varlog $x; = ¥'(4v), 


where y, y’ are the digamma and trigamma functions. It is convenient to choose f,, so that 
logf,/t is an unbiased estimate of logA. Thus we take f, = exp {y(n + 4)}, and for the 
present purpose it is accurate enough to write 


fo= 014, f,=n (n=1,2,...). (10) 


iin var log!* = p'(n+}4)~= ny _ . Pe ad se 


Thus if we define a transformed variate by 


z= logy if n=0, 
(12) 


= logio if n+0, 





1é 


0) 
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then approximately E(z) =log,A and varz = (log,,e)?v,, 


where v,= 4:93 (n= ot (13) 


=I/n (n+0). 


One use of the transformation will now be described. 

Suppose observations of stoppage rates are made for k machines on each of which stop- 
pages occur at random, the rates for the different machines being different. Suppose that 
a change in the process is introduced and that fresh observations are then made. In some 
cases it would be reasonable to expect a constant proportional change in stoppage rate, 
i.e. to expect that if the initial stoppage rates are A,, ..., A, the final stoppage rates will be 
pa, ...,pA,. If the observations are, in the initial period (n,,¢,), ..., (;,, t,), and in the final 
period (nj, t;), ...,(”%,t%), we may define transformed variates z,,...,z, and 2},...,2; a8 in 
(12). Then wu, = z; —z, has mean uw = log,,p and known variance (log,, e)?(v; + vj) = (logy) e)* w; 
say. The best estimate of wv is thus 





pm Se (14) 
An almost unbiased estimate of the parameter p is 

a 1 . 

p= {1- ss] 104, (15) 
An approximate test of the hypothesis of a proportional change in stoppage rate may be 


obtained by calculating 
Xia = {Zewj (uw, — 2)?} (logy e)?. (16) 


Example. The following data were the results of a sampling experiment with p = 2, 
the values of 7;, 7; being chosen arbitrarily. 











Initial Final 
a Us, 1/w,; 
T, (hr.) ni; T; (hr.) ny 
1 30 2 28 5 0-428 1-429 
2 26 4 41 21 0-522 3-360 
3 21 2 24 3 0-118 1-200 
4 45 18 16 1l 0-235 6-828 
5 30 4 32 12 0-449 3-000 





























Lwz) = 15-817, LDuwr! = 54611, Luwr! = 2-1774, 
The last two columns give 
Uy = %— 2%, = logy Tn/Tin, and v7) = (v,+%j)* = nynj/(n, +n), 
no frequency being zero. 
f= XLu,wz'/Lwz! = 0-345 and varfi = (0-4343)*/15-817, 
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so that standard error (f#) = 0-109. There is thus good agreement with the true value 
/ = logy, 2 = 0-3010. p is estimated by 


p= ( oe sxissii| 104 
= 2-145. 
The hypothesis of a proportional change in stoppage rate is tested by 
xz = (23026)? Dwz (uw, — f2)® = (23026)? {Seu — (Lwz) A} 
= 1-55, indicating a good agreement. 


This problem could be tackled without introducing the device (3) by the method of Dyke 
& Patterson (1952), using an iterative solution of maximum-likelihood equations. The 
present method is considerably quicker. The methods are probably asymptotically equi- 
valent when the n; and n; are all large. The transformation (12) can be applied in a 
similar way to more complicated problems involving Poisson variates. 


I am grateful to Mr D. A. East and Miss Patricia Johnson who did most of the cal- 
culations. 
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ORTHOGONAL POLYNOMIAL FITTING 


By JOHN WISHART anp THEOCHARIS METAKIDES 
Statistical Laboratory, University of Cambridge 


1. There are considerable advantages to be gained by using orthogonal polynomials 
when fitting curves to data by the method of least squares. The computer whose data are 
equally spaced, and.of equal weight, is well catered for by tables, the best known of which 
are due to Fisher & Yates (1938-53), extended by Anderson & Houseman (1942); similar 
tables, due to Van der Reyden (1943), have been partly reproduced in the new Biometrika 
Tables for Statisticians (Pearson & Hartley, 1953). In the general case of unequally spaced 
and weighted data similar tables are impracticable; what the computer needs is a good 
calculation scheme, so designed that it makes little appeal to the memory, or the mathe- 
matics, and is as far as possible self-checking. There is only one mathematical basis for the 
methods, but a number of computing techniques have been devised, chiefly in order to carry 
out the inversion of the matrix whose elements are the equivalent of the variance and 
covariance estimates in the parallel problem of multiple regression. The only scheme known 
to the authors as having appeared in Biometrika is due to Isserlis (1927), but this needs 
revision. Elsewhere there are protagonists of the Gauss-Doolittle methods, described in 
more than one text-book; of the pivotal condensation method due to Aitken (1933, 1937); 
and of the Choleski method dealt with by Turing (1948), Fox, Huskey & Wilkinson (1948), 
Fox (1950), Fox & Hayes (1951), Rushton (1951), Hayes & Vickers (1951), and others. The 
recent papers by Guest (1950, 1953) deal with the mathematics of orthogonal polynomial 
fitting, and a practical computation scheme is given in the former, consisting of a combina- 
tion of the Doolittle and pivotal condensation methods. (In fact, the available methods 
seem to differ only in the order and arrangement of the computations.) 

The purpose of the present paper is to supplement the Biometrika Tables by offering 
a single recommended computation scheme for the general case. Like some other schemes, 
it is set out in tabular or ‘blank’ form, for ease of general manipulation. It is approached 
through multiple regression analysis, and to stress this fact the standard regression notation 
has been used. The arrangement is such that everything that can possibly be wanted is 
obtained within the tabular scheme. One point which may be considered convenient, e.g. for 
plotting purposes, is that while the orthogonal polynomials are calculated, the result is also 
given at any stage as a simple polynomial. One of the authors has examined the alternative 
schemes in application to the illustrative example given later in the paper, and the con- 
clusion is that the method and scheme now offered are the mosi convenient for use by the 
average computer. 


2. We begin by outlining a useful method of conducting a multiple regression analysis 
which brings to account one independent variate at a time, so that the successive terms of 
the final regression equation are orthogonal to one another, although it can in the end be 
converted to the usual form. Although well known, computational details of the method 
are not very readily accessible. They were published a few years ago by one of the authors 
(Wishart, 1950), while Woolf (1951) later gave practical directions for regression calculations 
by successive insertion of additional variates, using throughout the Gauss multipliers. 
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Consider a dependent variate z,, usually assumed for significance test purposes to be 
normally distributed, and independent variates 2,,2%,,...,2,_,, usually assumed (for the 
same purposes) to be fixed; these are intended to be dealt with in the order stated. Let 
w denote the weight to be given to a sample observation. Add an additional ‘independent’ 
variate x, having a fixed value unity. Let & denote a sum of weighted products (or squares) 
of the designated variate measures taken over the sample, the size of which is NV. 

For p, q = 0,1, 2,...,n we write 


Syq = X(way%q), (2-1) 
and then calculate in succession the partial sums 
Spr0 = 01 — Spo S10/ Soo (p = 1,2,...,m), 
Spoor = Sp20-Sp1-0S21-0/Sir.0 (p = 2,3,...,”), (2-2) 


Sys012 = 8,301 ni S,2-01S32.01/ S22.01 (p = 3,4,..., 2), 
Sya0123 wd Sy4012- 3-012 943-012/ 533-012 (p = 4,...,), 


and so on. The displayed formulae (with n = 4) carry us as far as three independent variates, 
apart from 29. We have in particular 


Soo = U(w), Spo = U(wz,). 


The estimated regression coefficients, of progressively increasing order, are then calculated 


i , thus: 
in turn, thus deg = SuelSen 
bro = Snro/Sir0, (2-3) 
bneor = Sneor/S2201 
bnso12 = Sns.or2/Ss3.012) 
and so on. In particular Ono = U(wa,,)/X(w) = Z,. 
Partial z-variates form the next sequence, and are calculated as below: 
Lpq =p —Syo%q/Soo; 
Lp = Xp9 —Sp1.9%10/S;10; (2-4) 


Lp-o12 = Ty-01 — S201 %.91/Se2.01) 


and so on. The first of these is 2.9 = 2, —Zp. 


The multiple regression equation, written as a sum of orthogonal components, is then 
Xn = Ong %o + nr-0%1-0 + On2-01%201 + Onsore%s.012 + +++ (2-5) 


of which the first term on the right-hand side is Z,. For each term fitted there is a ‘sum of 
squares’ due to the regression, with one degree of freedom and, when all have been fitted to 
the chosen number of independent variates, there is a residual sum of squares. The analysis 
of variance, taken up to three variates, is shown in Table 1; in this, on the usual assumptions, 
the various mean squares are distributed as variance estimates independently of one another, 
with the assigned numbers of degrees of freedom. In the table the contribution due to Z, 
has been kept separate from S,,,,; in the more usual form the total sum of squares would be 
Sino» With N—1 degrees of freedom. The overall variance-ratio test of regression after 
bringing in three variates is illustrated, but note that this test could have been applied at an 
earlier stage, also that at any stage the mean square due to a single regression coefficient is 
tested against the residual mean square available at that stage. 





2) 


3) 
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Table 1 
Regression Degrees of Mean 
coefficient freedom Sums of squares squares 
bno 1 bnoSno =San—Sano 
Dar-o 1 Bnr-05n1-0 =Snno—Sano1 
n2-01 1 n201%n201 == Ynn-01— “nn-o12 S ano—S nn-0123 V, 
n3-012 1 n3-012 Sn3-012 = Sanoiz tig Snn-o1es 
Residual N-4 San-o123 V;, 
Total N S nn 


In practice it is helpful to choose as x, that variate which is most nearly expected to be 
associated with z,; then as z, the next in expected strength of association, and so on. The 
calculations can be reviewed when a fitted term is found to be insignificant; the appropriate 
variate may be discarded and another substituted. As a result of this procedure it may be 
possible to say with a minimum amount of labour that 2, is adequately ‘predicted’ by 
a limited number only of independent variates, a matter of some importance in a variety 
of problems, for example, in physical anthropology where large numbers of body and other 
measurements may be available, and for practical reasons it is desirable to deal with a small 
number of variates only for purposes of description. 

The more symmetrical arrangement of the regression equation, giving all the partial 
regression coefficients of the highest order, and also tests of their individual significance and 
a final check, require additional calculations which can be readily inferred from the applica- 
tion of the method which is made in §3 to orthogonal polynomial fitting. 

It will be noted that the first stage in the scheme involves the determination of the 
weighted sums of products (and squares) of deviations of variate measures from their 
(weighted) means. That is, S,,,.. is determined from S,, by removing the usual ‘correction 
factor’ S9S,o/So. If the quantities S,,.. are given in the first instance, one stage in the 
computations is eliminated. 

3. Turning now to the problem of orthogonal polynomial fitting, we read z, as y, to 
distinguish the dependent variate. We then let 

%=P=1, 4=e2=wez, = 2, ..., 
so that the independent variates become the powers of x, including the dummy variate. In 
the special case of n = N, the sample size, so that there are N — 1 independent variates, the 
multiple regression equation will then become the polynomial of degree N — 1 of perfect fit. 


The usual case is, however, where we endeavour to fit a polynomial of degree p, where 
p<N-—1. We then have 


Ss, = B.40-1,1 SF toe S,, a hae So, r+ ig X(wa't*), (3-1) 
where r and s take independently the values 0, 1, 2, ..., 9. We define also 
S,, = X(wyx"), needed for r = 0,1, 2,...,p. (3-2) 


Equations (2-3) will be needed for n = 1, 2,..., » as well as for = y, and we shall also find 
it convenient to calculate the quantities byo.,, 39.12, ... Which are defined below. In particular 
b,o is the mean value of the rth powers of the 2’s. 
In virtue of (2-3), equations (2-2) may be calculated as 
S10 = 81 — 5,9 S19 (r = :; 2, seey DP and Y), 
Sreor = Sro-O105210 = (7 = 2,3, ..-, p and y), (3-3) 


S012 = Syo01— 5201582001 (7 = 3,.-.,p and y), 
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and so on. Equation (2-5) becomes, in the present case, 
Y = byo% qt by1-9% 1-9 + Oy2-01% 2-01 + Sysu12%s-o12 + ++ (3-4) 


In Fisher’s notation (Fisher, 1925) the successive orthogonal polynomials are £ = 1, &;, 
£,, ---, with multipliers A, B,C, ..., so that the fitted polynomial is 


Y = A+ Bi, +Cf,+ Dé3+.... (3-5) 
From equations (2-4), by inserting b-coefficients, we have 
Tq = %—by, 
%ao1 = To9— 51929, (3-6) 
T3012 = X31 — 032012201) 
and so on. In virtue of the relations 
E= by, 22 =by, 2 = byy,...3 
boo — 521.9019 = 5291; 
bao — 533.9510 — 532.01 5201 = Os012; (3-7) 
531-0 — 532.01 521.0 = 5s1-02; 


and so on, we find, after appropriate substitution and simplification, replacing 2, 21, 2%, ... 
by 2°, z!,2?,..., that % =&=1, 


Yo =f, =—by+2, (3-8) 
Leo1 = $2 = —bo91—ba.9%+ 2%, 
Laor2 = $3 = — Oa912 — 51.92% — O59.91:%7 + 2°, 


and so on, and that 

A = byo, B = by10 C = byeo1 D = byso12 eoee (3-9) 
By combining we may write down the different terms of Y separately, or they may be 
cumulated one at a time, thus having the appropriate fitted curve at any stage expressed as 
a simple polynomial in z, as follows: 

Yq = by; 

Y = byor +byj9%; (3-10) 

Vy = dyer + Oyr.02% + by2.012", 

Vs = byo123 + by1.023% + bye.oi3t* + bys.oi22*, 
and so on, where Y, denotes (3-4) taken to degree r. The coefficients are calculated from 
(2-3), putting n = y, and from (3-7) with y written for 2, 3, .... 

The analysis of variance will be as in Table 1 with y substituted for n. The estimated 
variances of the coefficients of the power series in (3-10) may be written down in terms of 
the following ‘Gauss multipliers’: 


Coo > Sol, 
= | 
Coo1 = Coo — 510 ¢10, Cyo = Sis, 
Coo1z = Coo1 — 529-1291) C02 = €11.9 — 593.9210» 


ae Mages (3-11) 
Coo-123 = Coo-12 — 930-12°30-12» 11-023 = ©11-02 — 931-02 31-02» 


pss 
Coeo1 = Saz-01, 





es _ =i 
Co2.013 = C2201 — 532013201» Csa-012 = Sea-o12 4 





3-4) 
, Ey 


3°5) 


3-6) 


3-7) 


by tee 


3-8) 


3-9) 
r be 
d as 


10) 


rom 


ited 
is of 


*11) 
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in which 
Cro = —broC11.9; C2901 = — 520162201» Car-0 = — 52102201» C3012 = — Os012%ss-012> 
C31-02 = — 531.92Cs3.012» 32-01 = — 952-01°33-0129 
and so on. If we denote the ‘mean squares’ by 
Syyol(N —1) = 88, Syyo:/(N — 2) = 83, ..., 
then, V denoting ‘estimated variance of’, we have 
V(byo) = Co 82; ) 
V(byo1) =Coor8  V(Oyr0) = %r08i 
V(byo12) = Coo1283, 9 V(Dyr-02) = €11-0283; 


V(byorzs) = Cooras8> V(byro2s) = 11-0258 [ (3°12) 


V(bye01) = C22.0193; 


V (bye013) pas C22.01383> V (bys.012) = C33.012.83>) 





and so on. Finally, it may be useful to have formulae for the estimated variances of the 
ordinates of the different order curves. Calling these ordinates if ¥ ..., We have 
V(¥o) = Coo £5.85; 
V(X) = (Coo 56 + 11.053) 81, 
V(¥,) = (Coo 85 + C11.053 + Co2.01 53) 935 
V(¥s) = (CooSS + C11.053 + C2201 $3 + C33-01253) 93, 
and so on, in which, of course, & = 1 and the values of £,, £,,... are obtained from (3-8) for 
any value of z. 

4. For practical computation purposes it is desirable to systematize the evaluation of 
the quantities set out in succession in §3. This leads to the tabular scheme set out in Table 2, 
which proceeds as far as the fitting of a cubic, and can readily be extended for higher degree 
polynomials. Owing to the symmetry of the original matrix of sums of squares and products, 
one need only enter quantities on and to the right of the diagonals. This is indicated by the 
heavy vertical lines in the upper part of the table. A small number of supplementary lines 
are needed to obtain the actual polynomial coefficients; these are on the left immediately 
below the main table. The column on the right is used for the various contributions to the 
sum of squares, and from this the analysis of variance is readily written down. The check 
column is of the usual character for this type of work. The operations of the rest of the table 
are followed here, and then in those lines where a tick mark is shown the check is that the 
number in this column is the sum of all numbers in that line to the left, stopping at the 
heavy vertical rule. A useful final check is shown in the lower box on the right. The table 
shown in lines 1’-16’ is a subsidiary one, giving the matrix of the c’s, inverse to that of the 
S’s. Only the diagonal c’s are required for the estimated variances of (3-12) and (3-13), but 
the whole matrix has been included for the sake of completeness. It is useful to have all 
the c’s readily available, particularly when the scheme is used for multiple regression 
(see §2), since they are needed in calculating adjusted coefficients of the surviving inde- 
pendent variates, without redoing the calculations, when it has been decided to omit any 
such variate. The appropriate formulae for this purpose were given by R. A. Fisher in the 
5th edition of Statistical Methods for Research Workers, and discussed by Cochran (1938). 
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Table 2 


























































































































Check 
column Sums of squares 
Soo Sto Sto Sx Syo Syy 
-1 — dy — deo — bao — bye Vv byoSyo 
Su Su Sn Sy 
— dro — bio Siro — yo So — dio Sso % bisSye 
= bro Sir0 Sno Sno Syr0 v 
“10 =% — dao — bao — byr0 v byr-oSyr0 
8 8, 8, 
- by — by — bee Soo - 638 
berg 10 — bag at pie ba1-0Ss1-0 — bag yio 
— bron — bar Sis-01 Sse-01 Sy v 
20-1 e1-0 =1 — bso ~ Oys-o1 v bys-or Syo-01 
8, 8, 
by — bee Seo 49, 
bao 10 — by 0 ~buedpre — baro 1°0 
Pazar bron 552-01 021-0 bi bsp-01 ~ Ps2-01 Ss2-01 — 999-01 9 y2-01 “anh 
— Dsons — bsr-02 — bs2.01 Sss-o12 Sys-or2 v 
Cso-12 Csi -02 Csa-01 =~ “ bys-ore v bys-oraSys-o12 
b Sy ~S 
¥ “0123 
aE bro brave byr-0 ¥: yy 
b b 
yor yi 
a bron by2-01 iF Pn Dys-o1 byorass Sy 
: = +b 
b b b 7" yressiey 
voys yi-nn veg t + dysois 3” 
— bso125ys-or — b31-02Pys-o12 — bs2-01 Oys-or2 Oys-o12 + bysora Sy; 
byo-res byr-o2s bys-ors bys-o1 
y ¥y ¥ ‘¥3°012 aa Syy Pe Syy-oiss 
@ Final check 
‘00 
— bigeyo 
Co0-1 
— D901 C2041 
00-12 
— b 5012012 
oo-123 
ety Cu 
— b 91-0020. — ba -0021-0 
10-2 11-08 
— bs3-02Cs0-18 — bsx-02¢s1-08 
10-23 11-023 
Ceo Cn-0 32-01 
— b52-01 x02 — b59-01Cs1-08 — bs9-01 32-01 
0-13 21-03 29-013 
°s0-12 x1 -08 32-01 s3-012 24 








Subsidiary table giving the c’s. 
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Table 3 
1 14 91 803 7705 78 8691 538 
i -65 —51-357143 — 550-357143 —5-571429 — 62018571 Vv 434-5715 
3 803 7705 T1447 589 86635 
a) -591+5 ~ 5219-5 — 50082°5 —507 —56491-5 
5| -65 211-5 2485-5 273645 82 30143-5 Vv 
6 | -0-0307329 ofl -11-751773 ~ 129-382979 —0-387707 — 14252246 y 31-7920 
1 T1447 804121 5049 895125 
g | -57-357143 — 46057-7858 — 441936-787 — 4473-8573 — 498490-930 
a | 76386525 —11-751773 | — 29209-0318 — 321581-392 — 963-6454 — 354239-569 
10 | 19-029382 —11-751773 2180-1824 40602-821 — 388-5027 42394501 
u 000872834 —0-00539027 =f — 18623589 0-178197 ~ 1944539 Vv 69-2300 
12 8555663 46207 9491143 
13 | —550-357143 — 4240501:8 — 42927-857 — 4783153-93 
14 | 840-989364 —129-382979 — 3540500-5 — 10609-404 — 3900055-83 
15 | -354-305389 218-860190 ~ 18-623589 ~ 7561703 7235-315 —189537-76 
16 | —63-763168 89-477211 — 18-623589 18490-4 — 94-946 1839548  y 
17 | -0:003:44845 0-00483911 — 0-00100720 nt 0-005135 — 0-994865 Vv 0-4875 
18 5-571429 536-0810 
19 | -2-520096 0387707 
20 3-051333 0:387707 
21| -3-390979 2.094131 —0-178197 —0-9533 
Mia 2 1191-1777 
2 | 0.339646 2.481838 —0-178197 — 416-8707 
23 0327424 — 0-459465 0-095632 — 0-005135 — 237-2729 
%| —0.012222 2.022373 — 0-082565 — 0-005135 "536-0808 _ 
V 0-0714286 
2 0-1997638 
3 0-2711924 
4’ 01660949 
5 0-4372873 
6’ 0-2198841 
1 0-6571714 
8’| -0-0307329 0-0047281 
9| —0-1025736 0-0633452 
10° | -0-1333065 0-0680733 
1’| -0-3085577 0-4329902 
y| —0-4418642 05010635 
13" 0-00872834 —0-00539027 0-000458677 
4 0-06422252 — 0-09012160 0-018757678 
15 007295086 —0-09551187 0-01921635 | 
16’| —0-00344845 0-00483911 —0-00100720 00000540821 
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Little explanation is needed of the operation of this scheme. The primary S’s (from (3-1) 
and (3-2)) are entered in lines 1, 3, 7 and 12. To form line 2, all quantities in line 1 are divided 
by Soo, whose reciprocal is Cy), a quantity which is at the same time entered on line 1’. In 
this and subsequent calculations the signs are changed to the right of the heavy vertical 
rule. Line 6 is similarly obtained from line 5, line 11 from line 10 and line 17 from line 16, 
by dividing by S,,.9, Szo.9, and S43.9;2, Whose reciprocals are C,.9, C22.91 AN C33.912 Tespectively. 
Line 4 comes from line 1 as shown using the multiplier — 6,9; the next line is the sum of lines 
3 and 4, as is indicated by the short horizontal rules. At the next stage lines 8 and 9 are 
derived from lines 1 and 5 by the use of b-multipliers already determined in lines 2 and 6; 
line 10 is another summation, this time of lines 7, 8 and 9. And so on. The appropriate 6’s 
and c’s are entered in the supplementary lines and subsidiary table as calculated, the 
subsequent calculations in these lines being as shown; vertical additions are made in pairs 
of lines, for example line 20 is the sum of lines 18 and 19, line 3’ is the sum of lines 1’ and 
2’, and so on. 

We are only concerned in this paper with the fitting of orthogonal polynomials, and the 
computation scheme has therefore been drawn up with this end in view. But it should be 
pointed out that the scheme generally is one for matrix inversion, at any rate where the 
simultaneous equations whose solution is desired are symmetrical, and it may be found to 
compare favourably with other schemes which have been advocated in the past. It requires 
no adaptation to be used in standard multiple regression work, as described in §2. 


5. Numerical example. We shall fit as far as the third degree polynomial the following 
data: 


Weight (w) 1 2 1 1 2 3 1 2 1 
Values of x 0 1 3 4 7 8 10 1l 12 
Values of y 0 2 5 6 9 8 7 5 4 


Since for a polynomial of the pth degree we require the weighted sums of the first 2p 
powers of x and the weighted sums of products of y with the first p powers of x, as well as 
the weighted sum of squares of y (for the analysis of variance), the first step consists in 
calculating these quantities, either by means of a tabular build-up or by using tables of 
powers. The calculation yields 


Soo = X(w) = 14 S»=X(wy) = 78 
Sip = Sy = X(wz) = 91 S,, = Z(wyx) = 589 
Soo = Sy, = Soo = X(w2?) = 803 Sy. = X(wyx*) = 5049 
Syq = Sz, = Sip = Sop = X(wa*)= 7705 8,3 = X(wysx*) = 46207 
S3, = Soo = S13 = X(wat}= 77447 

Sse = Ses = X(wx5) = 804121 

Sys = D(wa*) = 8555663 98, = L(wy) = 538 


These quantities are now entered in the appropriate places of Table 2, and calculation 
proceeds. This is shown in Table 3, in which enough decimal places have been retained to 
show agreement at the final check to two or more places of decimals. Fewer figures need be 
written down if the accuracy desired is not of this order. For all ordinary purposes a 
maximum of four decimal places, or perhaps four significant figures, will suffice. 
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The fitted curves of degrees 0, 1, 2 and 3 are: 
Y, = 557143, 
Y, = 3-05133 + 0-38771z, 
Y, = — 0-33965 + 2-481842 — 0-17820z?, 
Y, = — 0-01222 + 2-02237x — 0-082562? — 0-005142°. 


The analysis of variance need not be separately written out. The residual sum of squares, 
from Table 3, is 538 — 536-0810 = 1-9190, with 5 degrees of freedom. s? is therefore 0-3838, 
whence it follows that the linear and quadratic terms, with mean squares 31-7920 and 
69-2300, are significant at the 0-1 % point. The cubic term, with mean square 0-4875, is not 
significant. If this were a practical problem we should either go on to one more stage, or, 
if that was not thought worth while, rest content with describing the data by means of the 
above quadratic Y,. This may be plotted against the actual data by substituting for z, when 
we get the following results: 

Weight (w) 1 2 1 1 2 3 1 2 1 

Values of y 0 2 5 6 9 8 7 5 4 

Valuesof Y, —0-34 1-96 5-50 6-74 8-30 8-11 6-66 5-40 3-78 
83 will be (0-4875 + 1-9190)/6 = 2-4065/6 = 0-4011. This is multiplied by Cop.49, Cy,.92 and 
Co9.9, respectively, and the square roots taken, to deduce the standard errors of the coefficients 
byo-12> Oys-02 aNd by 2.9, respectively, of Y,. These standard errors are 0-4188, 0-1652 and 0-0136 
respectively. From these it appears that b,,.9. and 6,9, are large in comparison with their 
standard errors, and very significant, but that 6,9. is not significantly different from zero. 
The calculation of the estimated standard error of an estimated Y, may be illustrated by 
taking the value 6-74, which corresponds to x = 4. 83 = 0-4011 has to be multiplied, as in 
(3-13), by Co 55 + C11-053 + 22.0153, 
in which the c’s are obtained from Table 3 and the é’s are given in (3-8) in terms of the b’s, 
also obtained from Table 3, and x. Putting x = 4, multiplying the above expression by 
0-4011 and extracting the square root, we find the estimated standard error at this point 


to be 0-26. 
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POPULATION DIFFERENCES BETWEEN SPECIES GROWING 
ACCORDING TO SIMPLE BIRTH AND DEATH PROCESSES 


By J. H. DARWIN 


1. INTRODUCTION 


In some ecological problems, treated especially by C. B. Williams (e.g. 1944), the numerical 
sizes of different species of plants or animals present in a sampling region have followed 
Fisher’s logarithmic law, viz. the expected number of species having an observed population 
size n is xa"/n (n = 1,2,...). In this a is a constant depending only on the biological associa- 
tion being sampled and not on the interval of sampling. This law for the observed species 
was derived from the negative binomial distribution, for which the probability that a 
species has n members is given by 


I'(n+k) o” 


iis Tin+ 1)T(k) (1+0)"** 





(n = 0,1,...), 


by allowing the parameter k to tend to 0, and the number N of possible species to tend to 
infinity in such a way that Nk = a, o/(1+o) = x. The negative binomial is a compound 
Poisson distribution, i.e. it arises from the assumption that the population of each species 
is represented by a Poisson distribution with mean m, the different m’s for the different 
species being regarded as a random sample from the distribution 


(1/o*T(k)) ak-1 e-2o dx, 


Thus if there is a large number of species, mostly rare, in danger of losing a member to the 
sample, Fisher’s logarithmic law might be a reasonable description of the heterogeneity 
between them. 

One may further discuss this heterogeneity in terms of the manner in which different 
species grow in the sampling region. For instance, D.G. Kendall (1948a) considered a popula- 
tion expanding in a simple birth and death process in which the probabilities of a birth or 
a death from among n members alive at time ¢ are, for the next interval dt, nAdt and nudt 
respectively; in dt there is also a probability «dt of the entry into the region of a new member 
from outside; A, ~ and x are constants. These transition probabilities lead to the negative 
binomial as a current description of the size of a species whose first member in the region is 
an immigrant. When the immigration constant « is small compared with A and yp, a log- 
arithmic law with a zero class results. The often observed logarithmic type differences in 
specific population size would then be reasonable if the species come into the region at times 
randomly scattered in the interval (0, ¢), and grow roughly according to the same geometric 
birth and death process. The possibilities of such ‘explanations’ of heterogeneity between 
species can be formally investigated for different theoretical distributions. It will be shown 
that when the population size is bounded, the form of solution of processes for which the 
transition probability rates do not vary with time enables one to test if a particular dis- 
tribution can describe the state of the process at any time. When the transition probability 
rates do vary with time, it appears that there may be many stochastic geneses of such well- 
known distributions as the negative binomial and the Poisson. Finally, a simple evolutionary 
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model for the changing number of species is applied to species which expand geometrically, 
and logarithmic type distributions of the observed population sizes of the species again 
occur. 

2. FINITE DISTRIBUTIONS 


If the size of the population can take any of the values 0, 1, ..., N,, the equations of the birth 
and death process are 


aP, “ = 
= = Onis Prir—(Ont+B8,)PatBarPr1 (n= 1,2,...,N—1), 
(1) 
dP. dP 
dt = %P,—foP; ey = — dy Py+by_1Py-, 


in which P, is the probability that a species has size n (n could be the number of regions out 
of a possible N in which the species could be found); «,, is the probability death-rate and 
£,, the probability birth-rate when the size is n. 

2:1. Transition probability rates not varying with time. When a,, and #,, are constants, 
which is, for instance, reasonable when the spores or seeds of a plant or the offspring of an 
animal can travel over distances of the same order as the dimensions of a region, the set of 
equations can be written as aP 
where P is the column vector of pi -babilities Po, P,, ..., Py, and A is the matrix with non-zero 
elements only in the three main diagonals 


Bo — Ay 
—Byo %+f, —O, ; 
—p, » 
Oyit+hyy —&y 
—By-1 ay 


It is well known that if the latent roots A; (i = 0,1, ..., N) of this matrix are distinct, the 
N 
solution P, is of the form > A,, e—#, where the A,, are constants independent of time and 
i=0 


embracing the initial conditions. It is further possible to show that if «,, ...,a,y, A;,...,By_y 
are > 0, these roots A, are real, distinct and positive (a result also established by B. J. Pren- 
diville, unpublished). This follows from an expansion of the leading (n + 1) x (n + 1) minor of 
| AI—A | as D,(A) in the form 


D,{(A) am: (A- (&, +2n)) Da_s(A) — Oy By_1Dy_(A). (3) 
An induction hypothesis that D,_,(A) and D,,_,(A) have real distinct roots Ao, 1, ..-)Mn—13 
Vo, V4; +++) Vn—g, Such that M9 < V9 <p, <...<Vp__g</y_, leads to the conclusion that D,(A) 
has real distinct roots separating those of D,,_,(A). An expansion of | AJ — A | shows, by the 
sign rule, that there can be no real negative roots, and that the lowest root of D,(A) for 
n<WN is >0. The result then follows. It can be used as a criterion for testing if possible 
processes of this simple type could lead to given distributions. 

Thus, if dP 


P= Pant) and Py =rit)P,, 
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(1) becomes, when P(t) is not identically zero, 


In(t) = XnyiTn— (Ln +8,) + Bn-ilTn-1- (4) 
This is a set of N + 1 equations in the N + 1 coefficients of e~*#. 


2-2. Pea eat In the binomial, P, = (7) p"(1—p)*-, since Py = p% must be of the 
form z A,,e~#, p can only have the form a+ be“, with 0<a+6< 1 and 0<a<1. Then 


Fe = (q) (ne 4 =p)" — GV =n) pr — p24] ( — dere) 


i 
3 
8 


and r, = 





n+ll—-p 
Whether or not a solution is possible can be seen by substitution in (4). This produces the 
sets of equations 





N- 
Onis ET oH (Ont Bu) 2)+ Bana qa (1-2)? = 0, (5) 
Onis yy 2A (Ant By) (1-24) — Buna qa) 2-2) = Mam), (8) 
N- 
Ons yt (On + Ba)+ Pui qt) = uN. (7) 


These provide 3(N +1) equations for 2(N +1) unknowns «,,, 8, but the sets are not in- 
dependent. The unique solution satisfying them is the well-known one 
hyn = aN, B, = B(N—n); a= B/(a+ 8), “= (a+), 


and 6 is determined by the initial conditions. 
Polya (1930) considered the compound binomial distribution 


1 a—1/] — »)f-1 
F. -{ (7) pX(1—p)yv-n Bp dp 
0 





Bia, £) 
= (ep ee (B+ N-—n- 1) (8) 
“Aan (a+ f)(a+f+1)...(a+8+N-1) 


In his work it arose from a contagion urn scheme in which, after each of N drawings from an 
urn containing « white balls and # black, two balls of the colour drawn are replaced. P, is 
then the probability of obtaining n white balls. Skellam (1948) applies the distribution to 
the secondary association of chromosomes in Brassica, and to accident and traffic problems. 
It can also be used as a description of the spread of species over N regions when the popula- 
tion size of a species increases by one for each new region in which it is found. The hetero- 
geneity in the binomial then describes that between the abundances of the different species. 

The form of P,,, as far as the constants a and f are concerned, is as in a hypergeometric 
disttibution, so that the present result will also apply to that. 

The only rational form of « and / that will not make P, an infinite series in powers of some 
e~“ is a = a—ce~“ and £ = b+ ce™, where 0<a, b<1, andc<a, 1—bifc>0, and —c<b, 
1—a ifc<0. Such an assumption is consistent, for instance, with the spread of a species 
from a few regions to many, or with the reverse process. 





— 





e 


5) 


1) 
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The equations similar to (5), (6) and (7) are now N +1 in number, the simplest to find 


coming from the coefficients of e~, e-"™ and e—™. They are: 
Ons1(4 +n) (a +n—1)(N—n)(N—(n—1)) 
— (q+ Bn) (m+ 1) (N—(n—1)) (6+ N—(n+1)) (a+n-1) 
+£,_.n(n+1)(b+N-—n)(b+N-—(n-1))=0, (9) 
n43(N —n) (N —(n—1))+(&_ +f) (m+ 1) (N—(n—1))+8,_.0(n + 1) 
= —k(n+1)(N—(n—-1)), (10) 
Ons 3(N —n) (N—(n—1)) (+n) (at+n—1)[p(a+n— 1)—(a)— (b+ N—(n+ 1))+ Y(6)] 
—(a,+£,)(n+1)(N—(n—1))(a+n—1)(6+N—(n+1)) 
x [W(a+n)—p(a)—p(b+ N —n) + W(b)] +8, 2(n+ 1) (6+ N—n) (6+ N—(n—-1)) 
x [W(a+n—1)—Y(a)—Y(b+N—(n—1))+¥(6)] 
= —k(n+1)(N—(n—1))(6+N—(n+1)) (a+n—-1) 
x [p(at+n)—y(a)— (b+ N —n)+Y(6)), (11) 


where v(x) = : log T'(z). 


(9) and (10) are satisfied uniquely by 


_ _kn(b+N—-n) _ k(a+n)(N—n) 
%n = Niat+6+N-1)’ Bu = Na+b+N=1)’ 








which do not satisfy (11). Thus this compound binomial cannot arise from a process of this 
simple birth and death type. 
3. THE MORE GENERAL EQUATIONS 


When 7 is not necessarily bounded by N and «, and £,, are general functions of time, the 
equations of the process are 


dP. 
a _ On+1(t) Paam (a(t) +£,,(t)) P, +B,-1(t) P, —1> 
dP. (12) 
, = a(t) P, — Bolt) Po. 
The first n + 1 of these add to give 
n dP. 
b ry a Ons Pair — BaP (n = 0,1,...). 


r=0 
Thus if it is possible to find a simple closed expression for > P., we can use this équation to 
r=0 
find values for «, and f,. Otherwise these values can be found with greater difficulty 
directly from (12). 
3-1. The negative binomial distribution has the form 


T(k+n) of” 


PA) = Taal Geayk =O). 
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Suppose & is constant and the distribution is generated from P,(0) = 1, ie. 7(0) = 0. Then, 
if o’ = do/dt, 


dF, T+) o'o’ Thiet) ofr 4» 4. 
GH ~Te)@—Nidtoy* Taal Atoynen ~F™—V—-S(n) 


Ae (n+k)o’ _ kino ) 
aa aie »(enss eg Pn) 





say, and 





The required equation in n is thus, for n = 0, 1,..., 


on dl = Ani TF Bn 


l+o n+ll+o0 n+k 








(13) 


The solution for o when «, and £,, are given positive continuous functions is, as required, 
>0. The simplest solution is Kendall’s (1948a), a,,, = a(n+1) and £, = A(n+k), where a 
and f are positive constants. But it is also true that negative binomial will result when a 
and f vary with time. Generally 


o = exp (-[. (a—f) at) [ sexp ([ (a—f) at) dt. 


Polynomial solutions of higher degrees for «, and £, will be more particular solutions in 
the sense that the coefficients of n,n?,..., cannot be assigned arbitrarily in both a, 
and f,,. 

3:2. For the positive binomial with probability parameter p and index N, the equation 
corresponding to (13) is 





SP _ Sur 5 Pal —P) (n = 0,1,...,N—1). (14) 
A solution «,,,, = a(t)(n+1), 8, = B(t)(N —7n) is then possible for positive functions a(t) 
and A(t). For instance, if n describes the number of regions out of N occupied by a species 
of plants, it is reasonable to suppose that the rate of new colonization is at its highest after 
each seeding time, when the rate of loss of colonies is at its lowest, i.e. a, and £,, are periodic 
and in anti-phase, a situation typified for example by 


a,,(t) = (4w) (1 —sin wt) "4 
B,(t) = (4a) (1 + sin ot) n. 
For these values, p = }{1 + (1/,/2) sin (wt — }7)}, having a time lag on £,,(¢) of half the phase 


length of a, and f,,, corresponds to p(0) = }. 
The Poisson equation similar to (13) and (14) is, for a mean m, 


(15) 


dm _ _%n41™ 


“dt n+1 +Bns 


which again is satisfied by arbitrary positive functions a(t) and A(t) in a, = a(t)n and 
B,, ex B (t). 

3-3. The geometric distribution is the case k = 1 of (13), but the required birth prob- 
ability rate £,, = A(t) (n+ 1) is not that usually met in the genesis of a geometric law, viz. 
£, = B(t)n. The difference arises from the different starting points. To get (13) we took 
P,(0) = 1, and considered a formula for P, which held for all values of n including 0. Clearly 








ie 
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a birth-rate proportional to the population size will not move a process away from P,(0) = 1. 
If, however, P,(0) = 1, a solution for the birth- and death-rates follows from the supposition 
that P does not necessarily agree with the formula for P, with n = 0. 


Then 1-P,(t) on 


P,(t) = l+o (l+o)"-1 





(n = 1,2,...). 


The equations obtained from addition of the process equations are 





ate (l+o)—no’ =4,,,0-f8,(1+¢) (n=1,2,...), (16) 
—=#@ 
and the particular equation qa,  1-F% 


dt 4140 oto a: 


The simplest solutions are those given by Kendall (19485), viz. a, = a(t)n and £, = A(t) n. 
For these, (16) and (17) reduce to 





o’ =—(a-—f)o7+P 
| 
and yey te Ts3" (18) 


the solutions of which are valid under the initial conditions P,(0) = 0, o(0) = 0. 
3-4. The logarithmic distribution for the initial conditions P,(0) = 1, 7(0) = 0, is given by 


1-P, o” 


Fd) = log (1+o)n(1+o)" 





(n = 1, 2,...). 
We require 


no’ d 1-P, _ Oy NO Bn~r~n(l+c) 
Arras *a (aga) |" werent @—po (1) 


together with two special equations for P, and P,. Suppose polynomial solutions are sought 


for a, and £,,. Then (19) when multiplied by n — 1 becomes a polynomial identity in n. Thus 
n+ is divisible by n+ 1, and £,_, by n—1. Whence, for n = 0, 
1—P,(t) = constant x log (1+¢). 
This is inconsistent with P,(0) = 1, and o(0) = 0; thus polynomial solutions are not possible. 
But if P,(0) = 1 and o(0) = 0, such solutions exist. An artificial one is the disturbed 


geometric process, a, = an, f,, = Bn (n>1), where a and £ are constants with «>; and 
A, is a special immigration term to maintain the population if it should drop to zero, viz. 





ke - , with k<—-—. 
1—klog (1 +53" —e-«-m0) log (5) 
Then P(t) = a for n>1, 
P, = 1—klog [2 Pa Xe (1 —e-ni)| , Where o= x= (1 —e72-At), 
a—-B e= 
When k- 0, £,>0, and P,(t)> 1, but 
F 1 g* ° 





1—P, og to)nl+oy" for n2>1. 
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4. A STOCHASTIC DEVELOPMENT OF NEW SPECIES 


In the previous paragraphs the discussion has been mainly of the growth of a single species. 
In this paragraph a simple model for the evolution of a number of different species is con- 
sidered as a possible source of heterogeneity. Yule (1924) discussed the groupings of species 
into genera when new species arise by mutation. He assumed that the total number of 
species in a genus is governed by a geometric birth process, so that the probability that a 
genus contains n species after a time of establishment ¢ is e~! (1 —e~*‘)"-1, for some con- 
stant A. He remarks that one might also suppose that the probability rate of speciation is 
dependent on the population sizes of the various species, but gives no guide on the form this 
dependence might take. One might, as a first approximation, suppose that the larger the 
population of a species, the greater its chance of producing a member which might be 
classifiable as a member of a new species. We shall assume that each species develops from 
its original member with birth and death probability rates proportional to the population 
size n of the species. The extra assumption is that there is a rate of speciation also pro- 
portional to this number. Suppose it is kn. There are difficult, incompletely answered 
questions as to whether there is an exact time of appearance of a new species, what degree 
of difference it must show from other existing species, and what the effect is of relative 
geographical isolation of groups of members of a species. We have had to suppose that a 
species arises at a definite time. Our reply to the second question is that differences of 
sufficient size are likely to occur when the population size is big enough. Finally, if isolated 
groups come together again, it seems plausible that the distinctions necessary to stop inter- 
breeding will have most probably occurred at some past time when the total population size 
was large; so that again the proportionality of speciation rate to size, and the fixing of a 
definite time of birth of a species are relevant. Then, too, the use of the geometric law for 
each species enables us, because of its linear birth and death relationship, to consider the 
population as a whole even though it is split into groups. It is not pretended, however, that 
the assumptions made are anything but a rough approximation showing, perhaps, the sort 
of thing that might happen if more factors were taken into account. Thus because of the 
other factors involved, such as the struggle for a changing amount of food of varying 
quality, the constant & must be regarded as summing up the mutation rates and the environ- 
mental variability. 

We shall be concerned only with the number of species and not their genera or higher 
groupings. 

4:1. The distribution of the number of species that have been born in (0, ¢) is not easily 
obtainable. Suppose NW species have been created at times 7,,7,,...,7y, and that initially 
there was one species with one member. Then, if g,,(¢) is the probability that a species has 
n members after time t, the probability of a further birth in (t, ¢ + dt) is 


N @ N 


ky Y nt—-7,)9n(t—7,) =k Y n(t—7,), say. 
i=0n=0 i=0 


j-1 
Suppose kz N(T;—T;) = 4,(T,...,7,) with 1, =0, 
=0 


and p(0,7) = exp (- i) : n(x) dz) a 





Th 
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Then it can be shown that Py,,(¢), the probability that there are N + 1 species at time ¢, is 


N 
| p(0, t) p(0,t—7,) ... p(0,t—Ty) TT] %;(79, ...,7;) dT... dT y 
= i) P(1,,...,Ty,t)dt,...dty, say, (20) 
Ry 


where Ry is the region given by the inequalities 0<7, <7,, 0<7,<T73,...,0<Ty<t. Pyss(t) 
then follows a birth process with transition rate Py,,(¢) given by 


Byilt) Pris(t) = [, P(7,, ..-5 Ty» t) Oy4a(To, «++» Ty, t) AT, ... ATy. (21) 
N 


It is hardly possible to use these results except for the simplest form of geometric distribu- 
tion. It would be most satisfactory to deduce a multivariate distribution of the different 
numbers of species which at time ¢ have 0,1,... members. Failing this we can find the 
average number of species having a given population. Since N follows a birth process, we 
may write the equation for the mean number, N, of species at time ¢, 

dN 

ry = ~ By Py. 
Then the equation for the expected number N,, of species with » members is 


t 
N,, = £ Up y Pygn(t—7) dt +9,(t) 


tdN 
= [Fe Inlt—n)dr + 9910. (22) 
0 aT 
When this is multiplied by m and summed, our initial assumption gives 
dN ‘dN _ ™ 
This has the unique solution for the Laplace transforms ¢ of 7, and y of dN /dt 
ke 
y= yh (24) 


4-2. The simplest example occurs when both birth- and death-rates A and y are equal 
to A. Then 
(At)»— At 


In(t) = (i+Ani (n = 1, 2, ssalle Jolt) os T+At’ with g,(0) =], 


This is perhaps the most realistic of the geometric laws, since it corresponds to a group of 
species so near, un the average, to equilibrium with their environment, that a new species 
has difficulty in establishing itself. R. A. Fisher (1930) holds that it is reasonable to suppose 
that most mutant genes that are likely to assist in the development of a species will probably 
disappear from the population in time. For instance, a gene giving a 1% advantage 
will die out with probability 0-98, while one giving no advantage will die out with 
probability 1. It is commonly considered that most mutants that can survive make 
little effective change in an organism’s relationship to its surroundings. Thus in the 
present artificial situation in which a new species is supposed to be created at a definite 
time, it must be assumed that it is not much better adapted than the others. If, as postulated, 
a species grows geometrically, its chance of possessing no members after time t>y/A if 
A> yp, and to 1 otherwise. Amongst such laws of growth then, the one most representative 
of what happens is that for which A = yw. Then 7(t) = 1, and Py(t) = e~*(1—e-*)*— as in 
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Yule’s case, VW = e#. When the distribution (22) is applied to actual data, the values of kt 
and At required are such that the distribution is very near its limiting form 
N, O oy eau yy 
P, = lim —__<__dy (n=1,2,...), 
—> co vo 1 pee 
t o (1+y) (25) 





"ac 
Po =1- ise dy, where a=k/d. 
0 


The p,, depend only on the ratio of k to A, since these both have the dimension 1/time. The 
following properties hold: 

(i) Pn>Pn+1 for alla>0, and n>1. 

(ii) py) may range from 0 as a>oo, to 1 as a>0. As a>oo, p,> 1. 

(iii) The rth factorial moment of (25) is r!(r—1)!/a’-1. Hence for small k, despite the 
fact that p)(~)—>1, the moments become unbounded. This is implied in the increasing 
average age, 1/a, of a species, since, as a species grows older, g,, still increases if n — 1 > 2A x 
its age, i.e. the larger contributions to the moments increase. When a is large, the ordinary 
moments all-> 1. The overwhelming majority of species are only recently born, and have 
in virtue of the initial condition for each species only about one member each. 

4-3. Variationina. One of the chief defects of the model is the assumption that all species 
have the same k and A. In (24) k can be looked on as a mean value, if it is assumed to be 
independent of time, since it occurs linearly. A does not occur till (25). In (25) then, it is 
possible to allow « = k/A to have its own distribution. Because of (i) the resulting distribu- 
tion of n will have the same general shape as (25). 

For (25) there is the recurrence relationship 


(N+ 1) Day, = MPyt+A(pyt+...+p,)+a%e* Ki(—a) (n= 1,2,...), (26) 
where — Ei(-—a) = i) “de (British Association Tables, vol. 1) 
and Pp, = «(1+ae* Ei(—a)) = a(1—pp). 


The terms extra to np on the right-hand side are of order a + «* e* log a for small a. The mean 
value of a will be small, since an ordinary birth is much more common than a species- 
generating birth. Hence one expects that HZ, p, will vary only a little from a logarithmic 
form, constant x 2"/n for small 1—xz. But a small a implies by (iii) a distribution with 
very large moments. A distribution that leaves all moments of Z, p,, finite does not lead to 
easily calculable probabilities. The most obvious dF (a) is dF(a) = (c™/T'\(m)) e-°* a™—1da. 
Then [m+ 1] moments are finite. When c is large, as is required by fitted data, the new 
probabilities may most easily be calculated from 

m — mn +1) 

cn 





a [}ose— h(n +1) —y(mn + 2) + WL) + (2) + one 


te Bi (n+r)(m+2)...(m+r+1) 





(logc—y(n+r+1) 


ri(r+1)!er 
—Yim+r+2)+Wlr+ D+ylr+2)+...], (27) 
m(m + 2) c \™1l m(m+ 2) 
ae a c(m +1) (5) c(m +1) 


x [ vay — em 1) + EY yer 41) —Wm+r+1))+...].) 











ir 


5) 


1e 


5) 


oP —  — 2 — 
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(27) can be fitted from the first two moments. Thus if u is the observed mean, and v is the 
ratio of the observed second factorial moment and the mean, the equations are, since the 
zero Class is not observable, 
2c 
=" (28) 


and = 1—py = —ae* Ei(—«). (29) 


(28) gives c in terms of m; there will only be a solution for m in (29) if 
Lag 
—>— ene — Tel — 
sean (— Ei( —2/v)). 


However, in the case tried, namely C. B. Williams’s data on different species of Macro- 
lepidoptera caught in light traps over 4 years (Fisher, Corbet & Williams, 1943), a solution 
is not possible. The nearest approach is when c and m->oo in the limiting ratio 2/v, i.e. 
when the distribution is back to the form (25). Then a better fit is obtained from the same 
distribution with the value of a given by the equation 1/u = —ae* Ei(—a). 

Generally, for any distribution of a, it is to be expected that the maximum-likelihood 
fitting of L,p,, will be very difficult. If, then, one were to fit by moments, including the 
second, the equation #,(2/~) = v would be used. Since H(2/x) > 2/H(«), the smallest mean 
of dF(«) would be 2/v, which is reached only when dF(«) tends to the form P(a+a)) = 0, 
P(a,) = 1. For the sort of distribution dF(«) we would be likely to consider, this would 
mean that the variance of « would > 0 as H(a)—>1/H(1/a) = 2/v. The data, of typical ‘log’ 
shape, are fairly well fitted by (25) with a = 0-00290 and 2/v = 0-00326. Thus it seems likely 
that a dF(«) leading to a better fit will have to have a mean near 0-00290 and so tending to - 
0-00326 from above. Thus it will have to have a low variance and there will be little point 
in using something like (27) rather than (25). An improvement might be found in allowing 
A to be different from yw in the geometric law g,,(t). 

4:4. The average distribution with unequal A and w. When A > u the limiting distribution 
of the type (22) has infinite moments of all but a few orders, and is thus less satisfactory 
than (25) as a description of data of this kind. 

When 4 >A the distribution is 


os ae 1/1 n+1) 
Dn = (A ate. (u—A) (2)" | a—1t+kkw—d) (] — g)n-1 (1 ~*) n me 
lad vad 0 ‘ad 

in which kK—(u—A) must be >0, as the average distribution only makes sense if a large 

number of species has been born (N the average number of species born up to time ¢ is 
proportional to exp (k—(u—A)t). 

If £ = k/(u—A), p, may be written as 

9)... (i) T(g) Pn) 

(u)/(u—A))?* T'(n+&) 


If A/w is nearly 1, = y/(y +1) say, the easiest form for calculation for p, is, when 7 is large, 





F(n+1,&; n+; Alu) (n=1,2,...). 


—Po = =I. a-1(1 — (A/p) x)“ dx 


_ €-)@+y § ENE E+r— DHEEI- WED (5 
=u 


(n+ 1) 








log (9 + 1)- 
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Then later p,,’s can be calculated from the recurrence relationship 


A E-1 g-1 
(n+ 1) Py = mp, tS (Pit *+Pa)— TT? (= s— Ps) (n = 1,2,...). 


(31) 


The chief difference between (31) and (27) is that in (31) the probabilities are exhibited as 
corrections to a logarithmic term, whereas in (27) they are given as corrections to a harmonic 
term. Both correction terms are small. The moments of the distribution d**ermined by (31) 
are all finite. One may find £ and 7 from the equations 


(32) 

§ Res; 1+ aT ice)” ~ 

The values of the right-hand side range from log (1 +¥v)/v when = 1, down to 
(2/v) exp (2/0) (—Ei(—2/»)) 


as £->0o (when 7 is substituted in terms of £). For these data £ = 7-3, y = 2549-0. The first 
25 frequencies and groupings up to an beyond 50 are compared with those for the log 
distribution. They agree closely after the first. The figures for the « distribution are so close 
to those for the £, 9 distribution as not to be worth recording separately. When « is small, 

as here (« = 0-00290), it is approximately 1—2 when the log distribution is given by 
N,, = const. x"/n. The value of 7 shows that if this model is at all relevant, the value of A/uz 
is almost indistinguishable from 1. The interesting suggestion from the ratio £/7 is that one 
species-generating birth occurs about every 350 generations. 


Macrolepidoptera figures for all four years 











N, N, 
- a é dist rd ti ° = é dist ee ti 
obs. a stribution obs. see AS ribution 
distribution z= 0-99742 distribution x = 0-99742 
1 35 44-09 40-14 17 3 2-25 2-27 
2 ll 21-80 20-03 18 3 2-11 2-14 
3 15 14-38 13-32 19 3 1-98 2-02 
4 14 10-54 9-96 20 4 1-87 1-92 
5 10 8-36 7-95 21 1 1-77 1-82 
6 ll 6-90 6-66 22 3 1-68 1-73 
7 5 5-87 5-65 23 3 1-60 1-65 
8 6 5-09 4:93 24 1 1-52 1-58 
9 4 4-49 4:37 25 3 1-45 1-51 
10 4 4-01 3-92 26-30 7 6-40 6°71 
ll 2 3-62 3°56 31-35 3 5-27 5-62 
12 2 3-29 3-25 36-40 5 4-46 4-81 
13 5 3-02 3-00 41-45 5 3-84 4:19 
14 2 2-78 2-77 46-50 3 3-36 3-71 
15 4 2-58 2-58 Over 50 60 57-22 63-81 
16 3 2-40 2-42 ———qx“-|-- 
Total 240 
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4-5. Use of the model for a finite distribution. Suppose as in (1) that species of plants are 
spreading over M regions. Then the spread may possibly be described by some simple 
birth and death process (a,,/,), with a, and £, not depending on time; we might, for 
instance, consider the logistic process «, = ywn*, £,, = An(M —n), where n is the number of 
regions occupied by a species. If one supposes that the mean number WN of species increases 
exponentially as in 4-2 and 4-4, the average rate of production of new species is 


~ By(t) Py(t) =a "age! 
where a is a constant. Then the age distribution of the NV = e”'species > a e~** dx as too. 
Suppose the probability that a species of age x occupies n regions is g,,(x) when g,(0) = 1. 


The limiting distribution p, that we have been using can then be obtained directly from the 
equations of the process for the g,,(x). Thus 


a” eat AE nll) dt = at e~% P (i) dt—a,, P,,(0) 
0 dt 0 


m= nsx] e P,dt—(a,+8,,) a" e-# P dt 
0 0 


‘oe 
+Brral e#P dt (N>n>1), (33) 
0 
'cO 
or, as af ec“ P dt = p,, 
0 
APy = Ons Pnti— (ey +Bn)Pnt+Bn+rPn—v (34) 


with the special end equations, 


—A+ ap, = A, p.—(%, +f) pi + Boo, 
Apo = %P,— BoM, 
and apy = —Ay py + By_1Py-1- 


The equations add to give a recurrence relationship such as (26) or (31), 
OnitPn = BnPn+U Pot + +p,)-@ (n=1,2,...). (35) 


When p, can be found, as is the case in 4-2, (35) can be used to give the p,,’s. For n < M, p, 
is of the form A, p)+B,, the coefficients A, and B,, being obtained from (35). Then p,, can 
be found by addition of the probabilities. 

To show the possibility of such ‘explanation’ of the distribution of species, we can fit 
data (kindly supplied by Prof. A. R. Gemmell) on the spread of moss species over the six 
islands of Hawaii. The process a, = 0, £,, = An(M—n) has been used. This is not quite 
unrealistic, as a species is only observed when it has a large number of plants and so is not 
so likely to disappear. However, even with a,, = 0, the fits are good: 


p _ (n—1)(M—-(n—-1)) _(n—2)(M—(n—-2)) 1(M —1) Y 
’ y+nM—n) y+(n—1)(M—(n—])  y+2(M—2)y+1(M—1) 





(36) 
oa, ‘inicecgi eile PR 

P= May Y= 4 

Biometrika 40 25 
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The figures are, for the three different reproductive types of species, dioecious, monoecious 
and sterile: 


Moss species in Hawaii 
































Dioecious Monoecious Sterile 
n Obs. Exp. n Obs. Exp. n Obs. Exp. 
1 12 11-19 1 15 13-88 1 22 21-78 
2 6 5-70 2 6 5-64 2 7 7-19 
3 4 4-22 3 3 3°39 3 5 3-56 
4 3 3°87 4 2 2-48 4 1 2-11 
5 4 4-53 5 0 2-13 5 0 1:39 
6 13 12-49 6 4 2-47 6 2 0-97 

y = 1-816 y = 4-308 y = 7-156 

















y was calculated in each case from the maximum-likelihood equation 


1 1 1 
2 Fsquarayt tye) 7 25, 

4-6. Remarks. The simple assumption of proportionality of speciation rate to the total 
population thus leads to distributions substantially like those observed. Yule’s data for the 
increasing number of species in a genus bears out the exponential manner of increase found 
for N. For such an N the form of the p, can always be found as in 4-5, and more realistic 
growth laws for the different species may account for any divergences in observed data from 
a logarithmic form. For instance, random immigration of new species or of members of 
existing species (which might include back mutation to other already existing species) may 
be considered. But these particular effects are of small weight in the p,, unless their rate of 
occurrence is of as high an order as the rate of increase of N. A more important change might 
be the consideration of competitive populations described by something like the logistic 
law, but they are possibly best considered in relation to the shifting of the population 
inside the living region. 


I wish to thank Prof. M.S. Bartlett for the suggestions he made in the course of this work, 
and Mr A. M. Walker for his helpful comments. 


REFERENCES 


FisHER, R. A. (1930). Genetical Theory of Natural Selection. Oxford University Press. 
FisHEer, R. A., Corset, A. 8. & Wrutiams, C. B. (1943). J. Anim. Ecol. 12, 42. 
KENDALL, D. G. (1948a). Biometrika, 35, 6. 

KeEnpaAtt, D. G. (19486). Ann. Math. Statist. 19, 1. 

Potya, G. (1930). Ann. Inst. Poincaré, 1, 117. 

SKELLAM, J. G. (1948). J.R. Statist. Soc. B, 10, 257. 

Writs, C. B. (1944). J. Ecol. 32, 1. 

Yotg, G. U. (1924). Phil. Trans. B, 213, 21. 








[ 383 ] 


MODIFICATIONS TO THE VARIATE-DIFFERENCE METHOD 


By M. H. QUENOUILLE 
Institute of Statistics, University of Oxford 


1. INTRODUCTION 


1-1, Methods of trend elimination in time-series analysis fall into two main groups. First, 
where the form of trend is known or may be assumed to be polynomial, it is possible to fit 
polynomials and to use residuals from this fitting in further analysis. This method has been 
described elsewhere (Quenouille, 1952). 

A second, more exploratory, method which assumes nothing about the form of the trend 
makes use of moving average difference formulae for its estimation. This method is, however, 
subject to the criticisms that the form and extent of moving averages chosen would seem 
to be somewhat arbitrary, and that the elimination process may easily give rise to misleading 
results (Spencer-Smith, 1947; Quenouille, 1949). 


1-2. The variate-difference method (Anderson, 1914) is closely related to the method of 
moving averages, and may be regarded as having greater generality. Any moving average 
difference formula may be shown to be equivalent to an average of higher order differences 
and, consequently, results obtainable from such formulae may be obtained by appropriate 
manipulation of the variate-differences. For example, if Vx; is used to denote the pth 
variate difference, i.e. 


Pp Pp 
VP x; — Liip — ( i) Vitp-1 + (3) Vi+p—2 wT eee 
then a simple five-term moving average difference formula may be written 
Bds = —%j42— U1 + 40j,—%j_1— Xp 


Results obtainable using the residuals from such a formula are therefore derivable from the 
variate-differences. 


1-3. The classical variate-difference method has therefore a greater flexibility than the 
use of moving averages, since the effect of the latter may be judged from the variate- 
difference. Further, it has the advantage that criteria and tests of significance can be set 
up to judge the extent of the trend. 

A major disadvantage still exists in that the existence of serial correlation in the residual 
element of the series will bias the results and produce misleading conclusions about the form 
of the series. Also, it is not clear that the tests used when serial correlation is not present in 
the residual element are the best possible. 


1-4, The following paper sets out to do two things. The first part, §§ 2-6, is concerned with 
deriving more sensitive tests of the existence of trend in variate-differences of any order, 
and with finding more accurate estimates of variances and covariances of the residual 
element from any set of variate-differences assumed to be trend free. The latter part, 
§§ 7-10, is concerned with how this theory should be modified if serial correlation is assumed 
to exist in the residuals. As such it does not give a strict criterion for the separation of trend 
and serial correlation, but it does provide a basis.on which the separation of the two may 
take place. 


25-2 
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For example, the analysis of §7-3 indicates the existence of both trend and serial correla- 
tion in Kendall’s sheep series. It shows further that trend is unlikely to be important in the 
third and higher variate-differences. The analysis of § 9-5 shows further that, if it is accepted 
that there is no trend in the third and higher variate-differences, at least a second-order 
autoregressive process is needed to represent the series and that, if such is used, the variance 
of the trend in the second variate-differences is unlikely to be more than 0-4. The conclusion 
is therefore that a second-order autoregressive scheme with a trend, smal] in its second 
differences, is a good representation of the series. As emphasized elsewhere (Quenouille, 
1949), this is not, however, the only possible representation, and it would still be possible 
to maintain that the series consisted of a complicated trend with a random element or, 
possibly, of a high-order autoregressive scheme with no trend. The process to be outlined 
gives rise to the simplest possible representation for any time series. 


2. GENERAL INVESTIGATION 


2-1. The classical variate-difference method is based upon the idea that by taking 
successive differences of a series of observations, tie ‘smooth’ part or trend of the observa- 
tions will be eliminated, and what remains can be used to find out about the random 
element. In particular, if the series x,, x,, ..., 2 ,, is free from serial correlation and no trend 
component exists in the pth variate-difference, V?x,, then 


(V?x,)? 
2p 
(7) 
estimates the variance, o?, of the random element. 
Any average of a series of such terms also estimates a”, and, commonly, estimates such as 





N 
> (V?x,)? 
dol 


V, aS. =. es 
(3) 
are used for this purpose. 
For reasons to be seen later, we shall use a modified definition which applies end-corrections 
to the averaging process. 


This will not affect the large sample theory, but it will have the joint effects of making 
it more applicable to small samples and of simplifying conversion formulae. 


2-2. We shall define a series of summation functions 2, by the formulae: 
LV XiYi = TY +%eYq +---+FyYns 
Ley MeYg = 3EYi+%Yq +--+ Fp pYnrtdenyn 
Dy Xp Ye = BEY + Freya t «+ Bp pYnatbenyns 


the summation in each case covering all values up to z,, y,,. The general law of formation is 
determined by the recurrence relation 


n n—-1 
Zam GY, = Zim» HUY p+ Up41Yt41), 


so that, for example, the first three cvefficients in 2, are 3, $ and {. 
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2-3. For any series, x,, Z2, ..., Z,, the estimated variances, Vo, mV; m¥e» ---» Will be defined by 


x2 
mo 
Lim _p(Vx)? 
mV, = at 4 


Under this definition, p cannot exceed m. However, m may be arbitrarily chosen so that it 
can always be arranged that it is equal to or greater than the largest value of p required.. 

In the following sections, m will be taken as equal to the largest value of p occurring. This 
is usually ten. Further, the prefix m will, in general, be omitted unless the line of argument 
requires its inclusion. 


2-4. Serial correlation siiibidneati may be similarly defined, so that 


_ 2m-p) v; ee, 


f, 
Lom x} 


mp ~~ 

Again, m must be chosen to equal or exceed p, and, with this condition, the suffix m may 
usually be omitted. 

2-5. Itis then not very difficult to derive the following identities (see Kendall, 1946, p. 23): 


2p(p— 1) 2p(p—1)(p—2) 
%, = "ol 1- yt (p+1)(p+2)* (pt+l)(p+2)(pt3) | 














PX p —1) > _ p(p?—1)(p?—4) 
aN oN en ae ae 
These formulae demonstrate the basic equivalence of the approaches employing variate- 
differences and serial correlations. They also provide a simple way in which to find the 
variances of combinations of the V; under different conditions. For instance, 


204 2p* 2p*(p— 1)? ] 
= =| 1 
val, = [+ erie GHP 
for a random series. 
2-6. When a set of variate-differences have been calculated as described above, it is 
reasonable to ask what is the best estimate of the residual or error variance in any series that 
may be obtained using these estimates. If it is supposed that V,,,,, ... Vn4» are free from any 


trend component, then wd problem is to find coefficients c; (i = m+1...m+p) so that 
+p 
c; = 1 and so that oe c,V, has the smallest possible variance. 





i=m+1 i=m+1 
This problem may be solved by finding a restricted minimum. If we note that 
9 (f +2) 
A | cov (h,1)— St] = 34 


ty \j 


where x, is the fourth cumulant of the residual elements, and will normally be assumed to be 
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of. 
zero, and (k,+Ao*)/N is supposed to be the variance of De c,V;,, the coefficients c; are given 
i=m+1 
by the symmetrical equations 


m+p 
> a4C,—A => 0 | 
i=m+1 > 
abe > (j=m+1,...,.m+>p). 
= po C; = -1] 
i=m+1 


Solution of these equations gives rise to the values for c,; and A. 
2-7. The following particular solutions exist for these equations 






























































p=1 p=2 
m 
Cm+1 A Cm+1 Cm+2 A 

0 1 3-00 2-50 — 1-50 2-50 

1 1 3°89 3°86 — 2-86 3-00 

2 1 4-62 5-20 — 4:20 3°43 

3 1 5-25 6-54 — 5-54 3°82 

4 1 5-82 7°88 — 6-88 4:17 

5 1 6-33 9-21 — 8-21 4-49 

6 1 6-81 10°55 — 9-55 4-79 

p=3 | p=4 
m 
Cm+1 Cm+2 Cm+3 A Cm+1 Cm+2 Cm+3 Cm+4 A 

0| 467 |} — 7:00} 3-33 | 2-33 7-50 | — 20-25 22-50 | — 8-75 | 2-25 
1 | 10-36 | — 17-91 8-55 | 2-69 | 22:71 | — 67-28 72-25 | — 26-68 | 2-52 
2} 1818 | — 33-04 | 15-86 | 3:00 | 50:37 | — 152-56 161-08 | — 57-90 | 2°77 
3 | 28-14 | — 52-43 | 25-29 | 3-28 | 94:33 | — 287-71 300-79 | — 106-42 | 3-00 
4 | 40-23 | — 76-09 | 36-86 | 3-55 | 159-00 | — 486-23 | 505-16 | —176-92 | 3-21 
5 | 54-45 | —104-01 | 50-56 | 3-79 | 245-2 — 749-6 774-4 — 269-0 | 3-41 
6 | 70°79 | —136-17 | 66-38 | 4-02 | 356-6 — 1088-9 1119-9 — 386-7 | 3-60 


























It may be seen that fairly substantial improvements in accuracy can be achieved by using 
several values of V; simultaneously. For instance, the variance of V, is 5-25. By combining 
V, and V,; the variance is reduced to 3-82, while the use of V, to V, reduces it to 3-00. 


2-8. It shouid be noted that the coefficients, c,, do not differ appreciably from those that 
would be used if the a,; were regarded as functions of i for which the value corresponding 
to i = —1 was required. For instance, in this case, for p = 2, we should have c,,,, = m+3, 
Cn4g = —(m+2); forp = 3, 

mei = (M+3)(mM+4)/2, Cyyg = —(M+2)(M+4), C43 = (m+ 2) (m+ 3)/2. 

This provides rough empirical approximations to the coefficients, c,, but for practical 
purposes it is convenient to deal with a still rougher approximation, derived by extra- 
polating the value of V; corresponding to i = 0, using the formula 
m(m +1) 


To = Vg VV + 


V2V,,—.... 
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If this is used, the following coefficients are obtained: 











p=l1 p=2 p=3 p=4 
m 
Cm+1 A Cm+1 | Cm+e A Cm+1 | Cm+2 | Cm+s A Cm+1 | Cm+a | Cmts | Cm+a A 

0 1 3-00 2 —1 | 2-56 3 —3 1 2-42 4 |. -6 4 —1 | 2-35 
1 1 3°89 3 —2 | 3-08 5 —6 2 2-89 8 | —15 11 —3 | 2-75 
2 1 4-62 4 —3 | 3°53 10 |-—15 6 3°17 20 — 45 36 —10 | 2-98 
3 1 5°25 5 —4 | 3-93 15 |—24 10 3°48 35 — 84 70 —20 | 3-24 
4 1 5-82 6 —5 | 4-29 21 |-—35 15 3-76 56 |—140 | 120 —35 | 3-48 
5 1 6-33 7 —6 | 4-62 28 |-—48 21 4-03 84 |—216 | 189 —56 | 3-71 
6 1 6-81 8 —7 | 4:93 36 |—63 28 4-28 | 120 |—315 | 280 —84 | 3-92 





















































2-9. This table shows that an appreciable increase in the efficiency of estimation of the 
residual variance is obtained even with simple extrapolation. The efficiency of this extra- 
polation method as compared with the most efficient linear combinations, obtained in § 2-7, 
is as follows: 








m p=l p=2 p=3 p=4 ‘ 
0 100 % 98 %, 96% 96% 
1 100% 97% 93% 92% 
2 100% 97% 95% 93%, 
3 100 % 97 % 94% 93% 
4 100% 97%, 94%, 92% 
5 100 % 97% 94% 92% 
6 100% 97% 940% 9207 























The loss in efficiency increases with both m and p. However, in view of the computational 
convenience and other properties, to be given later, of the extrapolation method, its use is 
generally to be advocated. Any loss in efficiency from its use can be compensated by 
employing more values of J,. 


2-10. If it may be presumed that V,,,,...V,,,, are free from trend, the trend component 
in V,, may be estimated by comparing it with V,,,, ... V,,,,. A further question is then posed: 
which estimate of the trend component in JV, is the most accurate? To answer this, we have 


m+p 
to find coefficients d;(i = m+1,...,m+p), subject to > d; = 1, such that the variance, 


i=m+1 
m+p 
(k,+Ao*)/N, of V,-— > 4,V,, isa minimum. These coefficients may be skown to satisfy the 
i=m+1 


equations 
m+p \ 
XD (45, — Ay) Ay — A= Aim — Imm 
i=m+1 
m+p 
— p d; = :. 
i=m+1 


(j=m+l,...,m+p) 


Solution of these equations gives rise to values for d; and A. 
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2-11. The following particular solutions exist for these equations: 












































p=1 p=2 p=3 
m 
Aine A Ansa An+2 dy, Ans Ina A 

1 1 0-222222 | 2-327 | —1-327 | 0-030612 | 4.081 | —5-388 | 2-306 | 0-007752 
2 1 0-108889 | 2-230 | —1-230 | 0-006949 | 3-763 | —4-636 | 1-873 | 0-000924 
3 1 0-067347 | 2-177 | —1-177 | 0-002469 | 3-587 | —4-239 | 1-652 | 0-000202 
4 1 0-046838 | 2-143 | —1-143 | 0-001113 | 3-475 | —3-992 | 1-517 | 0-000063 
5 1 0-034974 | 2-121 | —1-121 | 0-000582 | 3-402 | —3-834 | 1-432 | 0-000022 
6 1 0-027391 | 2-104 | —1-104 | 0-000337 | 3-345 | —3-713 | 1-368 | 0-000010 











It may be seen that these coefficients tend to binomial coefficients, i.e. d,,,,;=+(—1)* e ) , 


and this may be proved mathematically without much difficulty. (In the limit, the 
coefficients a,,—a,,, may be accurately represented as polynomials in j. It follows that 


@ limiting solution is given by the differencing process, d,,,; = (— 1)‘ (; ) and A = 0.) 


Estimates of trend components are thus given by variate-differences of the J,. 


+p 
2-12. The function "s t~ 1yme( a V, thus estimates the trend in V,,, if Vass --- Vnep 
i=m = 


é 
are free from trend. Its variance is given by 


vei EC 





<0 jm = + *) ‘e + *) Tom 


m+t m+) 


This is of order m*-*?, The following table gives the values of A: 
































Values of p 
m 
0 1 2 3 5 

1 3-0000 0-2222 0-042222 0-015918 0-007927 0-004602 
2 3°8889 0-1089 0-010522 0-002394 0-000796 0-000330 
3 4-6200 0-0673 0-003935 0-000599 0-000142 0-000043 
4 5°2531 0-0468 0-001833 0-000200 0-000035 0-000007 
5 5°8187 0-0350 0-000980 0-000080 0-000011 0-000002 
6 6-3346 0-0274 0-000577 0-000037 0-000004 — 
7 68118 0-0222 0-000364 0-000019 — a 
8 7°2578 0-0185 0-000242 a —_ = 
9 76781 | 00157 —_— os —_ = 

10 8-0766 — — — — _ 








For completeness, the variances of the V, have been included in this table in the column p = 0. 





follo 


Th 


rN 
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2-13. The efficiency of estimation of trend components using this approach is given in the 
following table: 








m 1 2 3 
1 100 % 73% 49% 
2 100% 66 % 39% 
3 100 % 63% 34% 
4 100 % 61% 31% 
5 100% 59%, 28 %, 
6 100% 58 %, 27 % 




















It is again true that the efficiency decreases as m and p are increased. However, the loss in 
efficiency may be easily compensated by including one or two extra values of V;,, i.e. by 
increasing p. 


2-14. The two processes of estimating the residual variance and the trend components 
may thus be carried out using differences of the quantities V;, However, it must be remarked 
that this is not the only possible approximate basis for estimation. An alternative, which is 
likely to be somewhat better, would acknowledge the fact that the variance of V,, increases 
roughly linearly with ,/(m) by extrapolating to zero using a suitably chosen scale instead 
of m. 

One such scale appears to be provided by ,/(m+ 1)—1. Ifthisis employed, with the method 
of divided differences, more efficient estimates can be obtained. The extra computation 
would, however, appear to be seldom worth while. 


2-15. It might be asked what is the probable bias resulting from the failure to eliminate 
trend completely. 

Suppose, for instance, that extrapolation is carried out using J, ... V,,,,, but that a trend 
component o} is present in V,,. The error in the extrapolated estimate, V4, is then 











me —1) a 
Amott... MOEN mtP— Ng (mt) (4 9). og 
However, the variance of 0? is 
(*) (f 4m + 2% + 29 
aot S$ (_ yt) amsing der 
m+t m+j 


Thus the variance of the error in V, is 





(m+1)(m+2)... manne A'o4 
N 


p! in” 


A’ is of order ,/(m)/p. 
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The following table gives a series of values for 2’: 











p 
m 

1 2 3 4 5 
1 0-89 0-38 0-25 0-20 0-17 
2 0-98 0-38 0-24 0-18 0-15 
3 1-08 0-39 0-24 0-17 0-13 
4 1-17 0-41 0-24 0-17 0-11 
5 1-26 0-43 0-25 0-17 0-13 
6 1-34 0-45 0-26 0-18 — 
7 1-42 0-47 0-27 — — 
8 1-50 0-49 — —_ —_ 
9 1-57 —_ — —_— — 


























As a consequence of the fact that A’ decreases as p increases, we may say that the bias 
in the estimate of Vj decreases with ,/(). 


3. EXAMPLES 


3-1. This theory may be illustrated using the two series given in the following table. The 
figures are in hundreds: 


















































a 1 2 4 1 2 a 1 2 a 1 2 
1 1000 1018 14 273 467 27 74 471 40 20 170 
2 905 927 15 247 465 28 67 458 41 18 148 
3 819 847 16 223 465 29 61 442 42 17 128 
4 741 776 17 202 468 30 55 423 43 15 109 
5 670 714 18 183 473 31 50 402 44 14 93 
6 607 661 19 165 477 32 45 378 45 12 78 
7 549 615 20 150 483 33 41 353 46 ll 65 
8 497 576 21 135 487 34 37 327 47 10 54 
9 449 543 22 122 490 35 33 299 48 9 44 
10 407 518 23 111 492 36 30 272 49 s 36 
11 368 498 24 100 491 37 27 245 50 7 29 
12 333 483 25 91 488 38 25 219 51 7 25 
13 301 | 472 26 | 82 481 39 22 193 — —_ — 
a 





Both of these series consist of a trend plus a random element. 
The trend in series 1 is given by 


and that in series 2 by 


= aad ae _((—26)7] |. | 
T, = 10 [ exp ( 10 )+ 7a myexP 200 (¢ = 1,...,51). 
In both series the random element consists of the error committed in reading the 
functions from three-place tables. Thus, in series 1, 
i-1 


u,; = 10°} exp (-i) to three decimal places -exe(-F5) | ’ 








and 








[he 


the 
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u, = 105| exp (- nd to three decimal places — exp ( -"Fs] 
(i — 26)? 
200 


1 1 (i — 26) 
V7) Jem P00 | 


The value of o? is consequently 816-6 for series 1 and 1633-3 for series 2. 


+4 





exp— to three decimal places — 


Table 3-la. Estimates of the trend components in series 1 















































Estimates 
V 
Ist 2nd 3rd 4th 5th 
1 2591687* 2582023-2 2573539-8 2565029-0 2556536-6 2548067-4 
2 (+ 699) 
2 9663-8* 8483-4* 8510-8* 8492-4 8468-2 8442-6 
(+795) (+ 42-1) (+ 13-08) 
3 1180-4 — 27-4 18-4 : 24-2* 25-6* 26-0* 
( + 867) (+ 331) (+ 8:00) (+ 3-12) (+ 1-52) (+ 0-84) 
4 1207-8 — 45-8 — 5:8 —1-4 — 0-4 —0-1 
(+ 924) (+ 27:6) ( + 5-46) (+ 1-80) (+ 0-75) ( + 0-34) 
5 1253-6 — 40-0 —4:4 —1-0 —0°3 —0-3 
(+973) ( + 23-9) (+ 3-99 (+114) (+ 0-42) (+ 0-18) 
6 1293-6 — 35-6 —3-4 —0-7 0-0 — 
(+1015) (+21-1) (+ 3-06) (+ 0-78) ( + 0-26) we 
7 1329-2 — 32-2 —2-7 —0-7 _ — 
(+ 1053) (+ 19-0) (+ 2-43) (+ 0-56) — — 
8 1361-4 — 29-5 — 2-0 _ — a2 
(+ 1087) (+ 17-3) (+ 1-98) _ _ —_ 
9 1390-9 — 27:5 _ _ — _ 
(+1118) (+ 16-0) —_ — — as 
10 1418-4 — —_ es eas 
(+ 1146) — —_ bas oe a 
* Significant at upper 1 % level. 
Table 3-16. Estimates of the trend components in series 2 
Estimates 
V 
Ist 2nd 3rd 4th 5th 
1 | 2173658-54* | 215443232 | 2136689-82 | 2119064-59 | 2101548-55 | 2084135-64 
2 19226-22* 17742-50* 17625-23 175 16-04 17412-91 17311-91 
3 1483-72 117-27 109-19* 103-13* 101-00* 100-23* 
4 1366-45 8-08 6-06 2-13 0:77 0:36 
5 1358-37 2-02 3-93 1-36 0-41 0-15 
6 1356-35 -1-91 2-57 0-95 0-26 win 
7 1358-26 — 4-48 1-62 0-69 _ — 
8 1362-74 — 6-10 0-93 _ — — 
9 1368-84 — 7-03 a — — — 
10 1375-87 on ae - on — 
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3-2. Tables 3-la and 3-16 give estimates of V, —V,, for the two series together with the 
values of successive differences VV, (m = 1 to 5). 

The bracketed figures in table 3-1a give the standard errors of the estimates, calculated 
using o? = 816-7. Those for table 3-16 would be ,/(2) = 1-4 times these values. 


3-3. Examination of the estimates for series 1 shows a trend component of about 26 in 
V, and a negligible component in V,. Similarly, for series 2, a trend component of about 
100 exists in V, and a negligible component in V,. 

Estimators of ¥, may therefore be formed using values after V,. Some possible combina- 
tions are given in the following table: 




















Series 1 Series 2 
V_-V, 919-0 1549-1 
V5-Vio 893-8 1522-6 
Ve-Vio 969-4 1484-9 
V.-Vio 969-4 1421-9 
Vs-Vio | 1053-4 1349-9 





The coefficients of variation for those estimates vary between roughly 
100,/(28141) = 26-1 % 
for the first pair and 100 ,/(4-5141) = 33-1 % 


for the last pair. None of these estimates differs from its theoretical value by more than 
its standard error. 


3-4. To set more exact confidence limits, it is necessary to calculate the efficiency factors, 
E, of the estimators, using the values of A given in the table of section 2-8. These are given 
by EZ = 2/A. 

Confidence limits may then be calculated in the same manner as for an ordinary estimate 
of variance based upon HN observations. 

For instance, the efficiency value for the fourth estimate is 2/3-92 = 0-5102, so that the 
effective number of observations is 41 x 0-5102 = 21. The 95 % limits for o* for series 1 are 
thus 969-4/1-69 = 573-6 and 969-4 x 2-04 = 1977-6. 


4, CORRELATION ANALYSIS FOR RANDOM SERIES 


4-1, For any two series it is possible to calculate not only the V’s estimating the residual 
variances of the series, but also a set of C’s estimating the residual covariances between the 
series. These C’s may be calculated in an exactly similar manner to the V’s except that each 
square has to be replaced by a product. 

For example, if two series z; and y; (i = 1,..., N+ 10) are used, it is necessary to replace 
(V™x,)* by (V™zx,)(V™y,) in calculating C,,. Thus, for instance, we may set up a series of 
equations 





Cis (V20x,) (V2y,) + (V9) Ke +...+ (View) (Vly) 
0, = $(V%x,) (V9y,) + (V®xq) oy +... +43(V%y 41) (V9yna1) 
etc. 
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4-2. A curious analogue exists between the formulae for the covariances and those for the 
variances. The pth covariance may be written in the form 


a __2p 2p(p—1) oO 
C, = COV, (x,y) ie (9) + Oy psa) orn) wig 
where COV» (x, y) = Cov (%,Y;), 


cov, (x,y) = }[cov (2;Y;,1) + COV (X41, ¥)], 
COV, (x,y) = $[COV (2;Y;42) + COV (X49, 4;,)]- 
The formula for V, is the particular case of this where x; = y;. 


4-3. Now, if the correlation between z and y is p, and no serial or lagged correlation exists 
in the two series, it is not difficult to show that 


2 2 
var [covg (x, y)] = (1+p%) “27 


1+p?oio? 


var (cov, (2, y)] = 5" "5° 


(¢ = 1,2,...). 


cov [cov, (x, y), cov; (x, y)] = 0. 


Since the factor (1+ *) occurs in all of these expressions, any process which minimizes 
the variance of a combination of the C;’s does so irrespective of the value of p. Consequently, 
the processes derived for the V; (i.e. where p = 1) may also be applied to the C;; the resicual 
covariance between x and y may be estimated by extrapolation in the same manner as the 
residual variances were estimated. 


4-4, For the same reason, correlation coefficients may be calculated using variances and 
covariances which have been estimated in similar fashions. The variances of these will be 
(1—p?)? 

EN ’ 
where £ represents the efficiency of estimation. 
The values of this factor are given in the following table, assuming that extrapolations 
involving Vi,41 tO Ving» OF Cin, to C,,, are used in estimating the correlation coefficient. It 


is derived by dividing the values of A in the table of §2-8 into two (the minimum possible 
value for A). 





Efficiency of estimated correlation coefficient 








m 1 2 3 4 

0 0-667 0-781 0-826 0-851 
1 0-514 0-649 0-692 0-727 
2 0-433 0-567 0-631 0-671 
3 0-381 0-509 0-575 0-617 
4 0-344 0-466 0-532 0-575 
5 0-316 0-433 0-496 0-539 
6 0-294 0-406 0-467 0-510 























4-5. It seems unlikely that the distribution of correlation coefficients calculated by the 
extrapolation method would differ very greatly from the distribution of ordinary correlation 





394 a odifications to the variate-difference method 
























































coefficients, since, in the limit, the estimates are 100 % efficient. It would therefore seem to 5 
be reasonable to test any correlation calculated by the extrapolation method as if it were follc 
an ordinary correlation coefficient based upon Hn pairs of observations. ‘Chis should give effe: 
a good approximation to the true state of affairs. 
5. EXAMPLE 
5-1. An example may be provided using series 1 and 2 of the last section. These series have 
a theoretical covariance of 816-7 and a theoretical correlation of 0-707. 
Table 5-1a gives estimates of C, to C, and their differences. 
Table 5-la. Estimates of the covariances between series 1 and series 2 
Differences 
Pp Cy j | 
Ist 2nd 3rd 4th 5th 
1 1561690-09 | —1551285-16 | 1542029-34 | —1532770-58 | 1523540-43 | —1514343-00 
2 10410-93 — 9255-82 9258-76 — 9230-15 9197-43 — 9163-66 
3 1155-11 2-94 28-61 32-72 33-77 — 34-15 
4 1158-05 31-55 —411 1-05 — 0-38 0-19 
5 1189-60 27°44 — 3-06 0-67 —0-19 0-06 
6 1217-04 24-38 — 2-39 0-48 —0°13 — ] 
7 1241-42 21-99 -—1-91 0-35 — ee th 
s 1263-41 20-08 — 1-56 _~ _ — : 
9 1283-49 18-52 — — — —_— obt 
10 1302-01 — — — apes ioe 
} 
tio 
This table shows similar characteristics to tables 3-la and 3-16, trend being largely as 
eliminated in fourth and higher differences. rol 
5-2. Estimates of covariances may be obtained in the same manner as estimates of mo 
variances. For example, using C,—C,, we get % 
is” 
4x5 4x5x6 
Cy = C,- 40,4 => V30,-—~ V8C, + ee 
= 1158-05 — 4 x 31-55 — 10 x 4-11 — 20 x 1-04— 35 x 0-38 — 56 x 0-19 
= 945-8. in 
* . *. . . . . va 
The following table gives a group of estimates obtained in this fashion: it. 
mi 
Estimate Estimated 
using covariance ie 
C.-C, 945-8 
CC 962-2 
OC 9 977°3 av 
CC 1004-6 
C,-C 1046-6 
ar 











1 to 
rere 
five 


ave 








M. H. QUENOUILLE 395 


5-3. These estimates may be used to find the correlation between the two series. The 
following table gives a number of estimates of correlation coefficients together with the 
effective number of pairs of observations on which they are based. 








Estimate using Coefficients No. of pairs 
C,* 0-658 27 
C,* 0-764 21 
C;* 0-873 18 
C, 0-901 16 
C; 0-912 14 
C, 0-919 13 
C; 0-924 12 
Cs 0-928 ll 
Cy 0-930 1l 
Cio 0-932 10 

C.-C, 0-793 29 
es - 0-811 27 
C.-C 0-815 23 
C1-C 9 0-856 21 
C.-C 0-878 18 

















* Biased by the existence of trend components. 


It may be seen that, apart from the first three estimates, which are biased by the trend, 
the estimates obtained using serial estimates of covariance are better than those individually 
obtained. 


5-4. Approximate limits may be set for any one coefficient by the use of the z transforma- 
tion. For instance, the value of z corresponding to the coefficient 0-793 is 1-080. This has 
a standard error of roughly 1/,/(29) = 0-186, and its 95 % confidence limits are consequently 
roughly 0-715 and 1-445. The corresponding 95 % confidence limits for p are 0-41 and 0-89. 

It is perhaps worth noting that the actual sample correlation between the two series when 
their trend is removed is 0-781. The agreement with this calculated correlation coefficient 
is very good. 


6. THE EFFECT OF SERIAL CORRELATION 


6-1. The theory given above is based upon the assumption that the series used are serially 
independent. Where this assumption does not hold, the accuracy of the estimates of 
variance, covariance and correlation will differ from the values given above. For this reason, 
it is necessary to consider whether serial correlation exists in the series used and, if so, what 
modifications in the ordinary procedures need to be adopted. 

In this and the next sections, attention will be largely concentrated upon the methods of 
answering three questions. These are: 


(1) How may serial correlations in the series be detected ? 

(2) May any existing serial correlations be fitted into a scheme of the moving average or 
autoregressive type? 

(3) What procedures exist for testing the correlation between series where both trend 
and serial correlation exist? 
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6-2. Before starting on a more serious consideration of these questions, one important 
point might be noted: If a large number of variate-differences are used in estimating the 
variances or covariances, these will give in general unbiased estimates of variances and 
covariances. 

This will be apparent if it is remembered that 


2p 2p(p— 1) ] 
= 1 —-—— ———_-_—— fo —... |, 
B= srint ee er 
and that the estimate of V, obtained from V,,,, to V,,,,, by either the most exact or by the 
extrapolation method, will be of the form 


Vo = Voll — 447, —4g72—a373—---], 
where a, 0 as p> oo. 
A more rigorous proof is outlined in Appendix 1. It indicates a desirable asymptotic 
independence of the serial correlation in the series. 


6-3. Of course, in practice, we cannot let p become very large, but, even so, an appreciable 
reduction in bias may be achieved by the use of several of the V’s (or C’s). 


For example, if V,, Vj, ...,V,, are used, the most exact method gives an estimate 


while the extrapolation method gives an estimate 





»_of,_.2. _ Mp-1) A p—1)(p-2) 
" =%|1 p+1" (p+l)(p+2)* (p+1)(p+2)(pt+3) ® | 


Both of these are liable to differ very little from Y. 


6-4. To illustrate the actual sizes of the bias, the following table gives the values that 
would be obtained for o? by the extrapolation methods on different series for which a = 1: 



































Series 
Moving Moving 
average average 
Estimate Markoff Auto- Auto- 
using regressive regressive 
‘sata a= }-l, a=1l, 1, ii Pi = — 0°64, 

inh b= 05 b= 08 A9e: PC eee 
V; 0-500 0-267 0-389 0-500 1-640 
V; 0-365 0-067 0-039 0-167 2-352 
vos 0-349 0-058 0-031 0-091 2-648 
V-Vie 0-796 0-841 1-179 0-909 1-035 
a x 0-668 0-594 1-071 0-818 1-096 
V5-Vio 0-477 0-175 0-201 0-545 1-445 
Was 0-386 0-078 0-047 0-273 2-040 








These values are reasonably satisfactory for VV, and V,-V,o, but are poor for V.—Vo. 
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6-5. The first of the above queries may be reformulated to read: If V,,,1... Vay,» may be 
assumed to be without any trend component and p,,;,9,2--. may be assumed zero, how 
may we estimate pj, Pz... Py! 

Let us consider, first of all, the case g = 1. 

One approach would be to express the means and variances of V,,,, ... Vny» in terms of 
p, and @», and thence to obtain maximum likelihood, or for rough uses least square, estimates 
of p, and a. These would be quite complicated to obtain and would still provide only a first 
step towards simplifying the serial correlation picture; it would undoubtedly be necessary 
to reconsider these estimates as p, and higher serial correlations were taken into account. 
A simplification of the procedure, to allow more rapid approximate testing to be carried 
out, is therefore desirable. 

One possible simplification is to ignore in the first instance the dependence of the variances 
of the V’s on the p;. It would then be possible to estimate o?,p,0?,... by least square 
procedure. 


An equivalent method to this is to estimate p,V, from the differences 


Ven ha Vin+2> Vin+2 —~ "m+89 °°°9 ca Vntge 


and to use this to correct for the bias in the value estimated for V, on the assumption that 
p, = 0. This approach is the simplest mathematically and numerically. 


6-6. We now consider a series of functions, R,,, Ry, ... defined by the formula 


R og EPS yy, 








_ 1:2 
It is not hard to show that 
ss __ 4p 9p(p— i) _ _16p(p—1)(p—2) ] 
Bay = Vol 1-5 43"2* (p+3)(p+4)” (p+3)(pt4) (pts)? |? 


so that if p, = p, = ... = 0, we have 
E(Ryy) = po, 


N 1 2p + 2q)! 2)! p! (q+ 2)!q!3pq+2p+2q+1 
= cov (Ry, — Ry) = = (2p + 2q) (p 2)! p!(q+2)!q!3pq+ 2p + 2g 


o 2(p+qt+2)!(pt+q)! 2p! 2q!  4pqt+2p+2q+1° 
6-7. Using this latter value, it is possible to set up equations similar to those of § 2-6. These 
may then be used to give the coefficients c, and A for estimating the first serial covariance 
and its variance. The following table gives the values of A: 




















Pp 
m 
1 2 3 4 

0 2-00 1-50 1-33 1-25 
1 3°92 2-37 1-88 1-64 
2 6-73 3°54 2°58 2-11 
3 10-54 5-03 3°43 2°52 
4 15-42 6-86 4-44 2-91 
5 21-47 9-05 5-65 4-54 























Biometrika 40 26 











Modifications to the variate-difference method 











398 
6-8. These values of A might be compared with those obtained by extrapolation: 
P 
m 
2 3 4 

0 1-52 1-39 1-32 
1 2-56 2°15 1-85 
2 4-15 3-29 2-88 
3 6-37 4-88 4-03 
f 9-28 6-93 5°38 
5 12-95 9-56 8-71 




















In this instance, direct extrapolation is not as efficient as previously, owing to the rapid 
increase in the variance of R,,, with p. This is of order (p+ 1). 

Better estimates may be obtained using the device suggested in §2-14 of using an 
alternative scale as a basis for extrapolation. In this instance, something like a quadratic 
scale is suggested, and the use of the scaie (m + 3)? appears to be fairly efficient. 


6-9. It is fairly apparent that anything like a full analysis to estimate the higher serial 
correlation coefficients would involve a great deal of work which could not in general be 
justified. The prime need is for some fairly rapid method of gauging the likely serial correla- 
tions in any series using their variate differences. The above analysis gives an indication of 
how this may be approached. 

In general, it is possible to define a series of functions 

(p+2m—1)(p+2m) 
Bnp = (2m—1) 2m VEn—1.p 
such that if the serial correlations higher than the mth are zero, the expectation of R,,, is 
Pmo?. These may be used for general investigational purposes. 

The variances of the R,,,, increase vary rapidly with p, so that the higher values are 
unreliable and, for extrapolation purposes, cubic, quartic and higher-order scales are 
required. 





7. EXAMPLES 


7-1. As a first illustration of these functions, series 1 and 2 may be used. Tables 7-1a and 
7-16 give the estimates, R,,,,, of serial correlation for these two series. Examination of these 


Table 7-1a. Estimates of serial covariance for series 1 



































0 Ist 2nd 3rd 4th 5th 
1 | 2591687 7746069-6 12825282-0 17776284-4 22551645-6 27102772-4 
2 9663-8 50900-4 127936-0 236115-6 376650-0 545402-0 
3 1180-4 — 274-0 1445-5 1755-6 4785-0 6921-2 
4 1207-8 — 687-0 714-0 — 680-4 792-0 — 7233 
5 1253-6 — 840-0 940-8 — 1016-4 1149-7 — 5888-9 
6 1293-6 — 996-8 1218-0 — 1429-1 3673-5 _— 
7 1329-2 — 1159-2 1542-8 — 2559-4 _ = 
8 1361-4 — 1327-5 2035-0 = _ — 
9 1390-9 — 1512-5 — — — — 
10 1418-4 —_ — ee — a= 
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Table 7-16. Estimates of serial covariance for series 2 





0 Ist 2nd 3rd 4th 5th 





1 | 2173658-54 | 6463296-96 | 10594736-62 | 14464143-22 | 17973894-37 | 21032116-20 
2 19226-22 106455-00 263205-75 484447-60 765799-3C 1098472-34 
3 1483-72 1172-70 3680-25 7950-26 16840-88 28245-34 
4 1366-45 121-20 367-64 — 623-28 545-49 233-08 
5 1358-37 42-42 575-40 — 854-70 430-23 935-46 
6 1356-35 — 53-48 808-50 — 1009-14 29-32 — 
7 1358-26 — 161-28 1037-85 — 1018-16 _— _— 
8 1362-74 — 274-50 1233-65 — csr) — 
9 1368-84 — 386-65 — — is Bs = 

10 1375-87 _ —_ — aa — 





























tables is sufficient to show that, in both instances, trend apparently exists in the third 
variate-difference and that the first three serial correlation coefficients are likely to be 
negligible. At first sight, the largest would appear to be the first serial correlation coefficient 
of series 2, but an extrapolation using R,;-R,, gives an estimate of 117-32 for the serial 
covariance. This is well within the bounds of chance variation. 


7-2. This analysis was carried out for four other theoretical series. These were: 


Series 3. Eighty terms of a Markoff process with p = 0-5. In fact, the first eighty terms 
of Kendall’s (1949) series 11. 


Series 4. Eighty terms of a moving average process for which p, = 0-4. This was deter- 
mined by u; = 2¢;+€,;_,, where €; was the random element from Kendall’s series 11. 


Series 5. Eighty terms of the autoregressive process wu; = 1-lu,;_,—0-8u,_,+¢;. In fact, 
the first eighty terms of Kendall’s (1945) series 3 multiplied by 0-05. 


Series 6. Series 5 plus a trend component determined from 


(i — 30)? 


a 
, = — i 49 ———_ . 
T; = exp 5o + sin 50° + 1000 


The estimates of serial correlation for these series are given in Tables 7-2a—7-2d. 


Table 7-2a. Estimates of serial covariance for series 3(o7 = 1-33) 











: rh 0 Ist Ond 3rd 4th 5th 
| 1 | 0812338 | 0-404673 | 0-267895 | 0-055272 0-035729 | 0-149848 
 -§ 0-677447 0-243936 0-228415 0-027483 — 0-086874 0-075819 
3 | 0-636791 | 0-152570 | 0-213692 | 0-081538 | —0-138569 | 0-019698 
4 | 0-621534 | 0-091515 | 0-179718 | 0-152082 | —0-149928 | 0-109313 
5 | 0615433 | 0-053004 | 0-129024 | 0-215688 | -—0-203984 | 0-290341 
6 | 0612909 | 0-031500 | 0-070200 | 0-288913 | —0-328416 as 
7 | 0-611784 | 0-022140 | 0-004538 | 0-389964 _ ind 
8 | 0-611169 | 0-021645 |—0-070455 antl wee — 
9 | 0610688 | 0-028050 wis aap ped oli 
10 | 0-610178 eke aa ws sine ee 
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Table 7-26. Estimates of serial covariance for series 4 (a? = 5-0) 








ix 0 Ist 2nd 3rd 4th 5th 
1 | 3561117 | 2503398 | 0-273930 |-0-306278 | -—0-239615 | 0-615038 
2 | 2726651 | 2-339040 | 0-492700 |-0-119911 | -—0-742828 | 0-328054 
3 | 2336811 | 2-141960 | 0-556938 | 0-342293 | -—0-966501 | 0-200368 
4 | 2122615 | 1-982835 | 0-414316 | 0-834330 | -—1-082098 | 0-452106 
5 | 1-990426 | 1-894053 | 0-136206 | 1-293402 | —1-305667 | 1-215830 
6 | 1-900233 | 1-871352 |—0-216540 | 1-762103 | —1-826737 — 
7 | 1-833399 | 1-900224 |-0-617018 | 2-324176 = — 
8 | 1780615 | 1-967535 |—1-063975 — _ — 
9 | 1-736892 | 2-064260 ~ ne ws $- 
10 | 1-699360 wi dhe sow = sie 





























Table 7-2c. Estimates of serial covariance for series 5(o? = 0-0905) 





Ist 2nd 3rd 4th 5th 


WA 
o 





0-02762389 0-04650171 0-03204585 0-00373254 |—0-02157966 |—0-02510146 
0-01212332 0-02727420 0-02937975 0-02051672 |—0-00104210 |—0-01603533 
0-00757762 0-01552230 0-01838865 0-02116514 0-00989108 |—0-00416622 
0-00602539 0-01026840 0-00956984 0-01612968 0-01229467 0-00632286 
0-00534083 0-00821772 0-00419328 0-01091376 0-00916798 0-00983670 
0-00494951 0-00751884 0-00121680 0-00762269 0-00495225 — 

0-00468098 0:00735660 |—0-00051563 0-00609892 _ — 

0-00447663 0-00741285 |—0-00168850 —_— — 
0-00431190 0-00756635 _ —_ 
0-00417433 —_— _— 


Cem IAAhkwWNe 


— 


























Table 7-2d. Estimates of serial covariance for series 6 (a? = 0-0905) 





0 Ist 2nd 3rd 4th 5th 


WA 





0-02943038 0-05195976 0-04107950 0-01625029 |—0-00570086 |—0-00601408 
0-01211046 0-02731206 0-02947215 0-02068429 |—0-00078025 |—0-01561504 
0-00755845 0-01552320 0-01839128 0-02116978 0-00986637 |—0-00416615 
0-00600613 0-01026855 0-00957054 0-01614690 0-01226992 0-00656084 
0-00532156 0-00821772 0-00418824 0-01094148 0-00902555 0-00874398 
0-00493024 0-00751968 0-00120420 0-00770154 0-00527813 _ 

0-00466168 0-00735912 |—0-00054615 0-00607750 —_ 
0-00445726 0-00741870 |—0-00171490 _ 
0-00429240 0-00757460 _ —_ 
0-00415468 _ _ 


Coe msAnrt whe 


— 























Examination of these tables shows that in no case could we be in doubt about the existence 
of serial correlation. There is very little difference between Tables 7-2c:and 7-2d, and, from 
these tables alone, there might easily be some doubt about the existence of a trend com- 
ponent in series 6. 

Table 7-2c indicates what is likely to be a common characteristic of oscillatory auto- 
regressive schemes: the direct estimates of serial covariances are greater in many cases than 
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the corresponding estimates of variance. This is due to the fact that the latter will tend 
to be greatly reduced by the serial correlation, while the former will not be reduced to the 
same extent and may even be increased. 

7-3. As practical illustrations of this approach, it is convenient to consider Kendall’s 
(1943) ungraduated sheep series for 1867-1939 and Davis’s (1941) figures of monthly freight- 
car loadings for the United States. 

Estimates of serial covariances for the former series are given in Table 7-3a. 


Table 7-3a. Estimates of serial covariance for Kendall’s sheep series 








¥ - 0 Ist 2nd 3rd 4th 5th 

1 | 3177-8585 | 6065-7162 | 5180-0270 | 2607-2578 | 875-0051 1306-9226 
2 | 1155-9531 | 2957-7000 | 3317-7000 | 1926-6983 |— 194-2952 — 946-0034 
3 | 663-0031 | 1630-6200 | 2285-5402 | 2047-5931 | 450-7071 — 424-8454 
4 | 499-9411 | 977-6085 | 1432-3764 | 1818-1422 | 695-8102 — 1471-9133 
5 | 434-7672 | 670-7607 | 826-3290 | 1522-9500 | 1423-6794 | —2034-3657 
6 | 402-8305 | 532-9492 | 410-9790 | 1011-8856 | 2295-5504 ae 

7 | 383-7966 | 478-1520 | 181-0050 305-5624 oan ‘ot 

8 | 370-5146 | 458-4060 | 122-2430 a ae are 

‘9 | 360-3278 | 447-2930 ao oe os ie 

10 | 352-1952 fe a oes ons a: 





























There would seem to be little doubt about the existence of either a trend component or 
serial correlation in this series. 

Trend apparently may be regardeéd as being confined to the first two variate differences, 
in which case an estimate of 7? may be derived from the third and higher variate differences. 
This estimate is more than 3700. 

There is a marked similarity between the general characteristics of these estimates and 
those of series 5 and 6. The large values taken by R,, and R,,, indicate that a second-order 
autoregressive scheme is likely to be needed to represent this series. 


7-4. The analysis of Davis’s freight-car loadings requires special adjustment because of 
the existence of seasonal variation. This could be easily carried out by an extension of the 
analysis of variance technique. It required the extraction of seasonal components from the 
formulae of § 2-3, using m = 11 instead of 10 as usual. Starting with 167 observations, this 
gave us a value for V, 167—11—12 = 144. 


Table 7-4a. Estimates for serial covariance for freight-car loadings 








‘ a 0 Ist 2nd 3rd 4th 5th 
1 900-290 835-608 415-660 265-937 320-355 483-472 
2 621-754 586-212 225-705 16-772 — 75-213 62-040 
3 524-052 495-930 216-720 63-571 — 117-513 68-763 
4 474-459 434-010 190-232 123-396 —157:184 | — 65-781 
5 445-525 393-246 149-100 190-080 —124-655 | —108-848 
6 426-799 368-396 97-260 234-828 — 78-006 aid 
7 413-642 355-428 43-890 258-830 at suas 
8 403-769 350-640 ~ 5-885 ya sis a 
9 395-977 351-175 atte ES im Gh 
10 389-592 ai se sie ms Line 
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The estimates of serial covariance for this series, when seasonal variation has been 
eliminated, are given in Table 7-4a. 

There is clearly a high trend component in },, but apparently not much beyond this. 
There is also clearly a bigh first serial correlation coefficient, a low second serial correlation 
coefficient and not much beyond this. These values suggest the fitting of either a Markoff 
process or a moving average scheme of length two. There is a fair degree of resemblance 
between this table and Table 7-2a. 


8. CORRELATION ANALYSIS FOR NON-RANDOM SERIES 


8-1. Before discussing correlation analysis for non-random series, it might be noted that 
the serial covariances of the variate differences may be directly calculated using the V,. If 
D,q is used to denote the qth serial covariance for the pth variate difference, then 





, Lm—p-q V?2;) (V?x;49) 
Pa N (‘?) 
Pp 
and the following identities apply: 


, 2pt+l 
v1 = yr leew 
=v —azPtiy (2p + 1) (2p +3), 
ee ee) SESS 


2p+1,. (2p +1) (2p + 3), 4(2p + 1) (2p + 3) (2p +5), 


pei (pt+1)(p+2) ?**  (p+1)(p+2)(p+3)  ?** 














Dys = V—9 


These may be used to calculate the serial correlations of the variate differences up to 
any order. 

8-2. A direct application of these formulae occurs when we consider the effect on the V, of 
any linear transformation of the z;. For instance, if the transformation 


Y, = (%— px;_,)/V(1+p*) 
is carried out, the revised V, are given by 
/ 2p 
Vg =h- i+ 
Similarly, a transformation of the type 


Y; = (Xj —ax,_, + bx;_2)/,/(1 +a? + 6?) 


fe = es 4 Sy f | 
gives values Vi =V, Ttata bet Tyaig pa D2: 


8-3. Correlation analysis may be carried out for non-random series in a similar fashion 
to that for random series. However, unless at least one of the series is random, the significance 
of the resulting coefficients is liable to be incorrectly estimated. 

As an alternative approach to testing the correlation between non-random series, it is 
therefore often convenient to try to transform the series so that they consist of uncorrelated 
random elements. This is the approach which will be considered. 





eU Se 
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The major alternative to this approach is to adjust the degrees of freedom of the resulting 
estimates by a factor which would take into account the. serial correlation in the series. 
This might be done using Bartlett’s (1935) adjustment. (See, for example, Quenouille 
(1952).) 

8-4. What the best method may be for determining a suitable transformation for serially 
correlated values when trend exists is not clear. However, one approach which appears to 
be reasonably satisfactory is to employ the transformation so that estimates of serial 
covariance of §6 disappear. This may be done using either the individual values or the 
extrapolates. The method will be clarified by examples. 


9. EXAMPLES 


9-1. It is not difficult to calculate the effect on Table 7-2a of adding D,,, to the estimates, 
V,,. This effect is given in Table 9-1a. The estimate of the first serial covariance from the first 
four values of Table 7-2a is 0-673841, while the corresponding estimate of the increment 
from Table 9-1a is 0-831860. Thus, 0-673841/0-831860 = 0-81 times Table 9-1 a if subtracted 
from Table 7-2a gives a zero estimate for the first serial correlation coefficient. 


Table 9-la 





Ist 2nd 3rd 4th 5th 


ve 
3 
° 





— 0-203832 0-540117 0-229965 0°151753 0-099939 0-054527 
— 0:383872 0-402138 0-121570 0-074023 0-055326 |—0-019595 
— 0:450894 0-353510 0-082915 0-039598 0-056662 |—0-086132 
— 0-486245 0-329820 0-066416 0-010752 0-106354 
— 0-508233 0-315588 0-062832 |—0-034368 —_ 

— 0-523261 0-305116 0-075720 _ — 
— 0534158 0-295020 — — _ 
— 0542353 —_ —_— — — 


cS2Ioanrk wns 





























This corresponds to a transformation y; = (2;—2z,_,)/(1+,?), where 2p/(1+,?) = 0-81, 
i.e. p = 0-51, and, if this is made in the above instance, all of the serial correlations may 
obviously be neglected. 


9-2. The existence of trend in the above series would have necessitated the use of higher 
terms. For instance, if trend existed in V,, the appropriate factor might have been estimated 
using the second to fifth terms. The estimate would have been 0-55982/0-685038 = 0-817, 
i.e. p = 0-52. 


9-3. For more complicated schemes, it is necessary to calculate the perturbations 
due to D,, D,, and to consider these jointly. These may be used to set up simultaneous 
equations to find coefficients necessary to make the higher order serial covariances equal 
to zero. 

For instance, for series 6, the perturbations due to D,, and D,, are as in Table 9-3a. 

Since more than the values in table 9-3a would need to be subtracted from Table 7-2d to 
obtain zero estimates, it is apparent that a Markoff scheme would not suffice as a means of 
representing the series and that at least a second autogressive scheme must be used. 
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Table 9-3a 
Perturbation due to D,, 

™ 

P 0 Ist 2nd 3rd 4th 5th 
1 0-01126469 0-03525495 0-03410505 0-01768935 0-00511876 0-00283112 
2 | —0-00048696 0-01479192 0-02146980 0-01370809 0-00280239 |—0-00792403 
3 | —0-00295228 0-00620400 0-01412618 0-01196438 0-00820514 |—0-00277375 
4 | —0-00357268 0-00216795 0-00914102 0-00778722 0-00980538 — 
5 | —0-00371721 0-00020916 0-00654528 0-00362736 — om 
6 | —0-00372717 |—0-00088172 0-00555600 _ —_— ang 
7 | —0-00369568 |—0-00162252 _ — sr = 
8 | —0-00365061 —_ — —_— _ _ 

Perturbation due to D,» 

m 

P 0 Ist 2nd 3rd 4th 5th 
1 | —0-00544013 | —0-00659004 0-01186480 0-02297309 0-02621533 0-02378685 
2 |—0-00324345 |-—0-01370892 | —0-00454455 0-00258399 0-00675336 0-01197115 
3 |—0-00095863 |—0-01189110 | —0-00592883 |—0-00161810 | —0-00140879 0-00229327 
4 0-00023048 {|—0-01019715 | —0-00525462 |—0-00090090 | —0-00273183 _ 
5 0-00091029 |—0-00907116 | —0-00495432 0-00035806 _— — 
6 0-00134225 |—0-00824544 | —0-€0502470 — —_ — 
7 0-00163673 | —0-00757548 — _ = os 
8 0-00184716 — — — — = 








To estimate the appropriate proportions of Tables 9-3a and 9-36 to be added to Table 
7-2d, extrapolates based upon the third to sixth variate differences might be used. These 


give equations 


or 


0-06902037 + 0-04286921A — 0-02305761B = 0, 
0-07588594 + 0-05124778A — 0-01022722B = 0, 


= — 1-40, 





These values might be compared with the true values A = — 1-39, B = 0-56 and with those 
obtained using extrapolates based upon the second to sixth variate differences, A = — 1-39, 
B=011. 

9-4. The insensitivity of B in this instance is of no great concern so far as subsequent 
correlation analyses is concerned, since it is apparent that any inaccuracies in A and B will 
give rise to schemes for which the serial correlation coefficients are small. It is, however, 
preferable to choose values of A and B corresponding to schemes for which the coefficients 
a and 6 are real so that subsequent analysis may be carried out if required. This necessitates 
that both 1+ A+B and 8B(1—B)—A should exceed zero. Thus, for instance, no scheme 
exists for which A = — 1-40, B = 0-38, but, if A = — 1-40 and B = 0-44, then both conditions 
are satisfied and a = 1-6, b = 1-0 gives a possible scheme. 

The adaption of this latter scheme instead of the true scheme means that the quantities 
used in any correlation analysis are still serially correlated. The amount of this serial 
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correlation is, however, too small (p, = 0-06, p, = — 0-29, p, = — 0-27, p, = — 0-07) to be of 
any importance. 

The variate difference analysis may then be derived by the appropriate multiples of the 
above table from Table 7-2d. This gives the following results: 











= 0 ist 2nd 3rd 4th 5th 
1 0-01126616 | —0-00029679 | —0-00144706 | 0-00159336 |—0-00133238 | 0-00048857 
2 | 001136509 | 0-00057145 | —0-00258517 | 0-00262992 |—0-00173212 | 0-00074591 
3 | 0-01126984 | 0-00160552 | —0-00399406 | 0-00370768 | —0-00224069 | 0-00072614 
4 | 001110929 | 0-00274667 |—0-00553892 | 0-00484840 | —0-00265962 oa 
5 | 0-01092618 | 0-00393359 |—0-00715505 | 0-00597672 wits es 
6 | 0-01073887 | 0-00512609 | —0-00878507 it < =k 
7 | 001055579 | 0-00629744 ms ios aoik wea 
8 | 0-01038086 ie te — — _ 
































This table indicates that very little serial correlation now exists in the series. The 
differences of the revised values of V,; may then be used to gain a better idea of the trend in 
the new series. 

9-5. A similar analysis was carried out using Kendall’s sheep series. This indicated clearly 
that a Markoff process could not be used to represent the series. The equations for deter- 
mining A and B, derived using the third to sixth variate differences were 


7437-9107 + 4292-2967A — 2408-4016B = 0, 
6891-9200 + 5904-1121A + 2639-7281B = 0, 


from which A = — 1-42, B = 0-56. 

The corresponding estimates using the second to sixth variate differences were A = — 1-35, 
B= 0-78. 

Both of these estimates are markedly greater than those for the artificial series, though 
the former estimate is very near the expected values for the artificial series. This serves to 
show that the differencing process wu, — 1-lu,_,+0-8w,_» will give rise to nearly random 
components which may then be used in a correlation analysis. The following table gives the 
values of V, if A = - 1-4, B = 0-5, i.e. if a = 1-21, b = 0-76. It may be seen that with these 
values there is very little evidence of serial correlation and trend does not seem to be of any 
importance after the first variate difference: 








\m 

p\ 0 Ist 2nd 3rd 4th 5th 
1 934-9359 72-5463 122-1597 211-8888 95-6965 457-7057 
2 910-7538 — 0-7502 — 29-1894 137-4583 — 278-7898 658-7784 
3 910-8788 10-9256 — 102-8280 310-9275 — 727-9566 | —4415-6187 
+ 909-7863 40-3050 — 232-3812 249-9381 1819-5162 _— 
5 907-0993 90-1909 — 315-6938 — 90-3925 _— jam 
6 902-8087 142-7166 — 291-0414 _ _ — 
7 897-7117 181-5221 — —_ _ _— 
8 892-6694 _— — -- _— _ 
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This latter statement can, of course, be tested using differences of the first column. In 
this instance there appear to be components of the order of 25-5 and 0-4 in V, and h. 

9-6. A final analysis was carried out using Davis’s freight-car loadings. Here again, the 
serial correlations could not be accounted for by a Markoff process, since a proportion of 
greater than one of D,,, would have been needed to account for the serial correlation. A second 
order process fitted using the third to sixth variate differences gave A = — 1-03, B = 0-38, 
while one using the second to sixth variate differences gave A = — 1-02, B = 0-39. The latter 
values correspond to a = — 0-55, b = 0-27, which is not unlike a moving average process for 
which p, = 0-44. The following table gives the values of V, for this process. 








0 Ist 2nd 3rd 4th 5th 
1 851-388 40-775 91-896 147-772 116-311] 232-453 
2 837-796 — 14-363 — 13-655 57-308 — 73-878 51-879 
3 840-190 — 8-900 — 44-356 103-275 — 109-250 80-180 
4 841-080 3-773 — 87-388 158-894 — 155-507 —— 
5 840-828 22-499 — 140-352 224-867 —_ -— 
6 839°757 45-891 — 201-680 — — a 
7 838-118 72-781 — oe = — 
8 836-096 —_— — — —_ —_ 





























There is apparently a low serial correlation in the revised series with a trend component 
of about 18 in the first variate difference. 


10. Discussion 


Since it is theoretically possible to reproduce any symmetrical derivea second-moment 
statistics using the quantities V;, results obtainable using any method of general trend 
elimination should be capable of reproduction using some modification of the variate- 
difference method. For example, the use of symmetrical moving average diiference formulae 
is equivalent to using a smoothed second difference and, therefore, any results obtained by 
this method should be obtainable using some modification to the variate difference method. 

Of course, as has been shown (Spencer-Smith, 1947; Quenouille, 1949), the conventional 
methods of eliminating trend are liable to lead to biased conclusions concerning the nature 
of the series, and, therefore, any alternative method which was unbiased or biased to a lesser 
degree would seem to have a claim for consideration. 

The analysis described above is certainly more efficient for random series, while, for non- 
random series, it seems likely to be both less biased and more efficient than the usual method. 
It also has the added advantage of, in effect, reserving judgement on the extent and com- 
plexity of the trend until the serial correlation can be examined simultaneously. It cannot 
be denied that it requires fairly lengthy calculations, but these would not seem to be 
greatly more than the conventional analyses now employed and would seem more amenable 
to modern calculating equipment, whereby differencing and summing of squares can be 
carried out. 

At this stage, many more practical problems will have to be tackled before the utility or 
otherwise of this method is established. It is interesting, however, to note that the old and, 
in many ways, discredited variate-difference method can by some slight modifications 
overcome many of the criticisms to which it is subject. 
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APPENDIX 
It is necessary to consider the effect of extrapolation on 


a 2. | 2-1) 
|g Vel nt «| 


when this is known for 7 = m+1,...,m+p. 
It is not hard to show that if 


1 1 
J() =—.-- for t=m+l,...,.m+p, 
a+ia 


then the value of (0) obtained by extrapolation is 


Me (m+ 1)(m+2)...(m+p) 
a(a+m-+1)(a+m+2)...(a+m+p) 








, ca j! 
This follows, since Vif(m+1) = mimi hints .eemtiet 
and JO) = flem+1) + (m+ 1) Vol 1) + 2 NMED gape 4-1) +o. 


As a corollary of this result, the extrapolate to zero may be found for any function which may be 
split into partial fractions. Thus the extrapolate to zero for the function 


i(é—1)...(6-—@+1) 


- : : for i=m+l.,..., 
(i+ 1)(@+2)...(@+9) mF 





f= 


equals 








& (—1FM(g+k—1)i(m+k)! (m+p)! _ ¥ Teas FAN bre ci 
" eae k—-1) (m+p+k)!mik!(q—1)! ° 


kat (Q—A)IRM(kK-1)! mt (m+ pth)! x 
(q+k—1)!(m+k)!(m+p)! 


In this latter form it may be seen to be the (gq — 1)th difference of the function ‘a tp+hinilile- i” 





If g>m-+>p, this function is a polynomial of order g—p and, consequently (as might be expected), the 
extrapolate is zero. 
If g<m+p>, it may be noted that 


m m 1 
(m+k)! 1 (7)e (3) e+ ) 


ki(m+p+h! (p+k)! (pt+k+l)!* (p+k+2)! 





k-1)! —1)! ! 
ind that the (9 —~ ies ditieenss of S22 — 5; - yp ter se . 
(p+k+a)! (a+p—q)!(a+p+q)! 








Using these two facts, 


the extrapolate may be put into the alternative form 


(") p* (3) p*(pt+ 1)? 
gim+p)'(p—1)t |. SOA 2 Bi 
m\p—q)!(p+q)! [((p+1)?-@*)  [(p+1)?-@][(p+2)-97] 





(-1)9 


The proof may be completed by noting that this expression is of order 1/p, when m is zero and p is 
large, q being regarded as small, and that even when q is of the order of p (= Xp, say) the expression is 
of the order (1+X)-” at most. For other values of m, a similar result may be obtained by considering 
the modification to the expression for m = 0. 
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MOMENTS OF THE RANK CORRELATION COEFFICIENT 7 
IN THE GENERAL CASE* 


By R. M. SUNDRUM 
Division of Research Techniques, London School of Economics 


1. INTRODUCTION 


The mean and variance of 7 are known in the general case and for a number of special cases. 
Some of these results are given below in terms of p, the probability of concordance. Two 
observations (x;, y;) and (x;, y;) are said to be concordant if (x;—.;) (y; — y;) > 0; then it can 
be seen easily that p is related to 7 by the formula 
T= 2p-l=p-q, 
where g = 1—p. 
(a) General case 
If we write p’ for the sample estimate of p (and similarly use a prime to distinguish sample 

statistics from their corresponding parental values), we have the result of Hoeffding (1947) 
for the general case: ‘ 

teat E(p’) = p, (1) 


2 
2(»') = ees _p? 

o(p') = ay {Pd + 20m — 2) (kD) (2) 
where k is the probability that out of three members drawn from the population, one is 
concordant with the other two. 

(b) Special case : independence 
For this case, Kendall (1938) has obtained 
E(p') = 4, (3) 
2n+5 
2(m’\ — 
ON?) = Tein —1)° (4) 
(c) Special case: normal correlation 


For this case, the mean and variance, due to Greiner and Esscher respectively, are given 
by Kendall (1948): 


E(p') = 5+=sin-p, (5) 
o*(p’) = aaa} [2 - (- sin-tp), + 2(n — 2) (5 - (= sin! iv) |] ‘ (6) 


Further, Hoeffding (1947) and Daniels & Kendall (1947) have proved that the distribution 
of t is asymptotically normal in the general case. Therefore, in the case of large samples, the 
above results are sufficient to fix the distribution completely. But it is notorious that the 
distribution of correlation coefficients in the non-null case tends to be extremely skew, 
particularly because of the limited range from —1 to +1; and that a very large sample is 


* Part of a thesis submitted for the degree of Ph.D. in the University of London. 
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required before the skewness of the distribution is small enough to be neglected. The case 
of medium-size samples, which are likely to be most important in practice, therefore 
requires some consideration of the departure from normality of this sampling distribution. 
The third and fourth moments of this distribution are known only for the null case of 
independence. The cumulant-generating function of the distribution in this case was derived 
by Moran (1950) and Silverstone (1950), and from their result we obtain 


H3(p') = 0, (7) 
100n4 + 328n3 — 127n? — 997n — 372 
In this paper, formulae for the third and fourth moments are derived for the general case, 
and for a special case of normal correlation. But first we prove a conjecture of Daniels & 
Kendall (1947) about the upper bound of the variance. 





2. UPPER BOUND OF THE VARIANCE OF ¢ 


Daniels & Kendall (1947) considered a population of ranked bivariate observations in 
which the sampling variance of t was maximized. They surmised that this occurred when the 
parent ranking was of the ‘canonical’ form, i.e. an arrangement where the X’s are in the 
natural order, a certain number of the Y’s at one end or the other of the sequence were in 
the correct natural order, and the others in the inverse order. A proof of this result is given 
below, in terms of p. 

Let the parent population consist of N bivariate observations, such that the probability 
of concordance is p; this means that out of the N(N —1) pairs (each pair counted twice) 
there are P = pN(N-—1) concordant pairs. Write P, for the number of observations con- 
cordant with the member whose X-rank is i; then P = } P;. The problem is then to find a 

i 


subdivision of a given P into a set of P,’s such that the sampling variance of p’ isa maximum. 
From (2) above, we see that in order to maximize the variance of p’, for a given p, we have 
to maximize k. 

Take two quantities P, and P; such that P, > P;. The contribution to k from these two is 
clearly P,(P,—1) and P,(P;—1). Now consider the effect of reducing P; by one and increasing 
P, by one. The contribution to k is then 


P(P,+ 8 1)(Pj-2), ie. P(P,-1)+P(B—-1)+2(P,-B)+2, 
which is always greate®than the original contribution; therefore the effect of such an altera- 
tion is always to increase the value of k. Therefore, k is maximized by increasing as many 
P, as possible to their maximum value and reducing the other P; as much as possible. 

Now the maximum number of concordances with any single observation is clearly (N — 1), 
and for every observation with this number of concordances there is a contribution of 1 to 
every other P;. Therefore the least possible value of P; is equal to the number of observations 
with (N —1) concordances. Further, in a ranking such as the following, 

X-ranks ak pi Oe 
Y-ranks i ee sce, 
Concordances P, P, Py ... Py 


for any P; = (N —1), the rank Y, must be the same as the X-rank i; all Y’s to the left must be 
less than Y,, and all Y’s to the right must be greater than Y;. Therefore all observations to 
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the left of (7, Y;) must be concordant with the observations to the right of these, and so these 
concordances are not reduced to their least possible value. In erder to do this, the maximum 
concordances must occur at one or the other end of the sequence. Therefore the ranking 
which maximizes k is one in which, for a given P, as many P, as possible at one end are 
assigned their maximum value (N — 1) and the remainder are assigned their minimum value 
equal to the number of observations with (N —1) concordances. It can be shown easily 
that this leads to a canonical ranking. This can be done in all cases, with the possible 
exception of one observation. 


R. M. SunpDRUM 








3. THIRD AND FOURTH MOMENTS IN THE GENERAL CASE 
To find Hs(p’) = E(p')’— 3E(p')? E( p’) + 2H(p’) 


we first consider (p’)*. Arguing in the manner of Hoeffding (1947), (p’)* is the probability of 
three pairs drawn from the sample, with replacement, being concordant. We may express 
this probability in terms of the following probabilities involving members of the sample 
drawn without replacement. Then (p’)? is the sum of the following eight probabilities: 


-2 
(i) The probability of drawing the same pair three times, (5) , multiplied by the 
probability that it is concordant, p’: (" -2 
>) ? 


(ii) The probability that two pairs are identical, but the third has one member in 
—2 
>) . 2(n — 2), multiplied by the probability, termed k’ by Hoeffding 


that among three members drawn from a sample, one is concordant with the other two: 
n\ ~* ; 
6( ) (n—2)k’. 


(iii) The probability that two pairs are identical, while the third is distinct, 


common with them, 3 ( 


2 _2 
3(5) (" 9 ) , multiplied by the probability of drawing two concordant pairs, without 
replacement, i.e. (p?)’, since the corresponding probability for the population is (p*): 


"Ca 


(iv) The probability that a pair has one member in common with the second pair, and 
the other member in common with the third, 3(3) or 2(n —2)(n—3), multiplied by the 


probability, /’, that in a sample of four, two concordant members can be chosen in such 
a way that each of them is concordant with one of the remaining two: 


(3) (n—2)(n—3)I’. 


(v) The probability that two pairs have one member in common, while the third is 
completely distinct, 3(3 


-—2 a Ny 
) 2(n — 2) (" 9 ‘) , multiplied by the probability, (kp)’, ie. the 
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probability of getting a set of three members with the property mentioned in (ii) above, 
multiplied by the probability of a distinct concordant pair: 


(3) (5°) —2) cy. 


(vi) The probability of drawing three pairs, all with one member in common, and with 
8 

>) (n — 2) (n—3), multiplied by the probability 

t’ that among four members drawn from the sample without replacement, one is concordant 

with the other three: 


the other members of each pair distinct, 2( 


-2 
2(3) (n—2)(n—3)t’. 
(vii) The probability of drawing three pairs, any two of which have only one member 
in common, i.e. if there are three members, A, B, C, the three pairs are AB, AC, BC, 
—2 
2(3) (n—2), multiplied by the probability that among three members drawn from the 


2 
sample without replacement, each is concordant with the other two, w’: 


2(3) eae, 


—2 iin i 
(viii) The probability of drawing three distinct pairs, (:) % 2 " (" 9 " , multiplied 


by the probability that they are all concordant, (p*)’: 


n\~* (n—2\ (n—4 
3\/ 
(3) ("2") ("2") 
In a similar manner, (p’)*, the probability of drawing four concordant pairs from a 
sample, replacing each pair, is the sum of the following twenty-three probabilities. 
-3 
(i) The probability of drawing the same pair four times, (:) , multiplied by the 
probability that it is concordant, p’: ( n\-3 
>) P 


(ii) The probability of drawing three identical pairs, and the fourth with one in 
n 
2 


drawn from the sample, one is concordant with the other two: 
n\-3 : 
8 ( ) (n—2)k’. 


(iii) The probability of drawing three identical pairs, and the fourth distinct, 
-3 an 
(5) (" ‘) , multiplied by the probability that two pairs drawn without replacement 


=~ 
common, 4( 2(n—2), multiplied by the probability, &’, that among three members 


2 2 
are concordant, (p)’: 


2(3) —2)(n—3) (pty. 
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(iv) The probability of drawing two sets of identical pairs, with one in common, 
3(5) me 2(n — 2), multiplied by the probability, k’: 

(3) (n—2)k’. 

(v) The probability of drawing two identical pairs, and the third and fourth with 
the same member in common with the first two, and other members distinct, 
6(5) ty 2(n — 2) (n—3), multiplied by the probability, t’, that among four members drawn 
without replacement from a sample, one is concordant with the other three: 

12(3) (n—2)(n—3)¢’. 

(vi) The probability of drawing two identical pairs, the third with one member in 
common with the first two, and the other member in common with the fourth pair, 
6(5) ie 4(n — 2) (n—3), multiplied by the probability, /’, that among four members drawn 
from a sample, two concordant members can be chosen in such a way that each of them is 


concordant with one of the remaining two: 


24(5) (n—2)(n—3)I’. 


(vii) The probability of drawing two identical pairs, with one member in common with 


e' 
the third pair, and the other member in common with the fourth pair, 6 (;) 2(n — 2)(n—3), 
multiplied by the probability, /’ (as in (vi)): 


12() (n—2)(n—3)I’. 


(viii) The probability of drawing two identical pairs, with one member in common 
pt i 
with third pair, and a fourth pair distinct, 12(5) 2(n — 2) (" 9 ‘) , multiplied by the 
probability, (kp)’, as in (i) and (ii): 


-3 
12(5) (n—2)(n—3) (n—4) (kp). 


(ix) The probability of drawing two identical pairs, with the same member in common 
-8 
with the third and fourth pairs, whose other members are common, (3) 2(n — 2), multi- 


plied by the probability, u’, that among three members drawn from the sample without 
replacement, each is concordant with the other two: 


12(5) orn 


Biometrika 40 27 
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2 2 
multiplied by the probability, (?)’, that two pairs drawn without replacement are con- 
cordant: 


-3 —2 
(x) The probability of drawing each of two distinct pairs twice, 3(5) £ ). 


(5) m—2) (m3) (ny. 


(xi) The probability of drawing two identical pairs, and two other pairs with one in 


-3 iy 
common, 6(5) ¥ 9 ‘) 2(n — 4), multiplied by the probability, (kp)’, as in (viii): 


\-3 
6(3) (n —2) (n — 3) (nm —4) (kp)’. 


(xii) The probability of drawing two identical pairs, and two distinct pairs, 
8 “i n—2 ‘we 
2 ee fh 
multiplied by the probability, (p*)’, of getting three concordant pairs without replacement: 
3 (n\- : 
3(2) (m= 2)(n—3) (n—4) (n—5) (pay. 


(xiii) The probability of drawing three pairs with one member in common and others 
distinct, and the fourth pair having the uncommon members of the others, 


n -3 
2(3) 4(n —2)(n—3).3, 
multiplied by the probability, a’, that among four members drawn from the sample without 
replacement, one is concordant with the other three, two of which are concordant with 
each other: ah 
n 
24(’) (n—2)(n—8)a’. 
(xiv) The probability of drawing two distinct pairs, the third pair having one member 


2 
multiplied by the probability, b’, that among four members drawn from a sample without 
replacement, each is concordant with two other members: 


-3 Ls 
each from the first two pairs; and the fourth pair having the other members, 3 (5) 4(" ‘) ; 


(3) (n—2) (n—3)0’. 


(xv) The probability of drawing three pairs with one in common, the fourth pair having 
one member in common with the uncommon members of the first three pairs, and another 
-3 
member distinct, 4(5) 2(n — 2) (n—3)(n—4), multiplied by the probability, w’, that 
among five members drawn without replacement, one is concordant with ti. ° others, one 
of which is concordant with the fifth: 


-8 
24(') (n—2)(n—3)(n—4)w’. 
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(xvi) The probability of drawing a pair with one member in common with a second pair, 
and other member in common with a third pair, while the fourth pair has an uncommon 


-—3 
member of the second or third, and a distinct member, 0(') 4(n — 2) (n— 3) (2n—4), 


2 
multiplied by the probability, x’, that among five members drawn without replacement 
from a sample, one is concordant with a second, which is concordant with a third, which is 
concordant with a fourth, which is concordant with the fifth: 


-3 
24(°) (n — 2) (n—3) (n—4) 2’. 


(xvii) The probability of drawing one pair with one member in common with a second 
pair and other member in common with third pair, while a fourth pair is distinct, 


(3) 4(n — 2) (n—3) ee 


multiplied by the probability, (/p)’, as in (vi) and (i): 


ae" 
12(5) (n—2) (n—3) (n—4) (n—5) (Ip)’. 


(xviii) The probability of drawing three pairs, any two of which have a member in 
~3 San 
common, while a fourth pair is distinct, 4(5) 2(n — 2) (" 9 ‘) , multiplied by the prob- 
ability, (wp)’, as in (ix) and (i): 


-3 
4(5) (n—2)(n—3) (n—4) (upy’. 


(xix) The probability of drawing three distinct pairs, one of which has a member in 
—3 Sa de 
common with the fourth pair, 12(5) (" 2 ‘) (" 2 ‘) (n —6), multiplied by the probability, 
(kp?)’, as in (ii) and (iii): 


3(') ~* (n—2) (n—3) (n—4) (n—5) (n—6) (p?)’. 


(xx) The probability of drawing four pairs with one member in common and other 


-3 
>) (n — 2) (n—3) (n— 4), multiplied by the probability, y’, that among 


five members drawn from a sample without replacement, one is concordant with other four: 


members distinct, 2 ( 


2(5) (n—2)(m—3) (n—4)y’. 


(xxi) The probability of drawing three pairs with one member in common, and a fourth 

-3 as 
pair distinct, 4(5) 2(n — 2) (n— 3) (" 9 ‘) , multiplied by the probability, (tp)’, as in (v) 
and (i): ue . 
n , 
4(5) (n — 2) (n—3) (n— 4) (n—5) (tp)’. 


27-2 


Pn 
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(xxii) The probability of drawing two pairs with one member in common, and two 


—3 - 
2) 2(n — 2) (" i ‘) 2(n—5), multiplied by the 


other pairs with a member in common, 3( 9 


probability, (£*)’, as in (ii): 


6(5) ~* (n— 2) (n—3) (n—4) (n—5) (KY. 


3 /n— ak “as 
(xxiii) The probability of drawing four distinct pairs, (3) (" 9 ‘) (" 9 " (" 2 ), 


multiplied by the probability (p*)’, of drawing four concordant pairs without replace- 
ment: 


1 (n\-* 
a we ae he a sbi = 4)! 
ig (2) —2)(m—3)(m—) (n—5)(n—6) (n—7) (AY 
Collecting these terms together, we get the following results: 


b3(p’) = (3) {p+ (n—2) (6k + 2u) + (n—2) (n—3) (p+ 61+ 29 

+ (n—2)(n—3) (n— 4) 3kp + 3(n— 2) (n—3) (n— 4) (n—5) p*}, (9) 
L;(p’) = - {(61 + 2t— 18kp + 10p3) n? + (64 + 2u — 6p? — 301 — 10¢ + 72kp — 34p*) n 

+ (p+ 9p? + 30p*— 12k — 4u + 36] + 12t— 72kp)}, (10) 


ila’) = (5) {e+ (n—2) (14k +120) 
+(n—2) (n—3) (Zp? + 124+ 361+ 240+ 6b) 
+(n—2) (n—3) (n—4) (18kp + 24w + 24a + 4up + 2y) 
+(n—2) (n—3) (n—4) (n—5) (3p? + 12lp + 4tp + 6k?) 
+ (n—2) (m—3) (n—4) (n—5) (n—6) 3kp? 
+4(n—2) (n—3) (n—4) (n—5) (n—6) (n—7) p4, (11) 


n\-3 
wal’) = (5) (0(k—p*)?n! 
+ (6kp + 24w + 24a + 2y — 6p? — 96lp — 32tp — 84k? + 270kp? — 108p*) n® 
+ (Zp? + 12t+ 361 + 24a + 6b — 126kp — 216w — 216a 
— 24up — 18y + 75p* + 7201p + 240tp + 426k? — 1446kp? + 19124) n?2 
+ (14k + 12u —3p?— 60t— 1801 — 120a— 305 
+444kp + 624w + 6242 + 96up + 52y — 213p3 
— 1776lp — 592tp — 924k? + 2988kp? — 18824) n 
+(p— 28k — 24u + 21p? + 72t + 2161 + 1440 + 366 — 432kp — 576w 
— 576a — 96up — 48y + 180p* + 14401p + 480tp + 720K? 


— 2160kp? + 630p*)}. (12) 
Considering only the dominant terms, we find 
o*(p') ~ 4(k—p*) n-, (13) 
Hs(p’) ~ const. n-?, (14) 
Map’) ~ 48(k — p*)?n-*, (15) 


so that 8,0 and £,>3 as n>0o. 
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This agrees therefore with the general result for higher moments given by Hoeffding 
(1947) in his proof of the asymptotic normality of the distribution in the non-null case. 

These formulae for the third and fourth moments involve ten parameters, namely, 
p, k, l, t, u, a, b, w, x and y. In the general case when nothing is known about the parent 
population, we have no information about these parameters, except that provided by the 
sample itself. Then we can use sample estimates of these quantities, suitably corrected for 
bias where necessary. These computations may, however, be much too tedious for common 
applications. It is therefore interesting to study the values of these parameters in special 
cases. 

4. SPECIAL CASE OF INDEPENDENCE 


We first evaluate the ten parameters in the null! case of independence, partly because it 
provides a check on our algebra, and partly because it is useful to have these values in 
extending to the non-null case. 

First we collect some preliminary results. Let X; (i = 1, 2, ...) be a series of random vari- 
ables, independently and identically distributed according to some common distribution 
function F(x). Define u,; = X;,—X,. Then we need the values of quantities like 


E(sgn uy28gn U3), 
where sgnu,;,= 1 if u,>9, 
0 if u,=0, 
=-1 if u,;<0. 


We consider also v,; = Y¥;—Y;, where ¥; (i = 1, 2, ...) is a series of independently and identi- 
cally distributed random variables. 

Obviously, H(sgn u,;) = 0. Given X,, the probability that both X, and X; are less than 
X, is {¥(x,)}*; and the probability that both X, and X, are greater than X, is {1 — F(z,)}*. 
Therefore 


Pr. ((X,—X,) (X,-X,)>0} = [(Peyare)+ [1 Fwyare) = §, 


Pr. {(X,—X,) (X,—X;)< 0} =}, 
so that E(sgn v1. 8gn U3) = 4. 

We can express the ten parameters in terms of the w’s and the v’s. Under the null hypo- 
thesis, every u is independent of any v. Hence, we can evaluate all these parameters. As an 
example, we evaluate the quantity k by this method: 

k = E{}(sgn uy. sgn ry + 1) $(sgn w13 8gn v3 + 1)} 
= fH(sgn u,,.8gn u,5) H(sgn v4. 8gn 045) 
+ }{H(sgn u,,) H(sgn v,,) + E(sgn u43) Z(sgn v3)} 


+t= ts 
In this way, we obtain for the null case: 
p= 3, u= t, t= oOo: 
k= 4=%p 9 y= te oe 
= b=%, 
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However, this method of evaluating the parameters does not lend itself simply to 


generalization to the non-null case. We therefore consider an alternative procedure used 
by Hoeffding (1947) to evaluate k, and based on the consideration that in the null case, all 
permutations of one set of ranks relative to the other are equally probable. In the following, 
we write P(1423) to indicate the probability that the ranks of the Y’s occur in that order 
when the X’s are arranged in order of magnitude. 
(i) p=, 
(ii) & = P(123)+4P{(132) + (213)} = +, 
(iii) 2 = P(1234)+4P{(1243) + (1324) + (2134)} 
+ 4P(2143) + ¢P{(1342) + (1423) + (2314) + (2413) + (3124)} 
=}, 
(iv) ¢ = P(1234)+4P{(1243) + (1324) + (2134)} 
+ }P{(1342) + (1423) + (1432) + (2314) + (3124) + (3214)} 
=4, 
(v) w= P(123) =}, 
(vi) a = P(1234)+4P{(1243) + (1324) + (2134)} 
+ pyP{(1342) + (1423) + (2314) + (3124)} 
= Je, 
(vii) 6 = P(1234) + 4P{(1243) + (1324) + (2134) + (2143)} 
= Js, 
(viii) w = P(12345) + $P{(12354) + (12435) + (13245) + (21345)} 
+ }P{(12453) + (12534) + (13254) + (13425) + (14235) 
+ (21354) + (21435) + (23145) + (31245)} 
+}P{(12543) + (14325) + (32145)} 
+ B;P{(13452) + (15234) + (23415) + (41235)} 
+ 4P{(13524) + (14253) + (21453) + (21534) + (23154) + (24135) 
+ (31254) + (31425)} 
+ qh P{(13542) + (14352) + (15243) + (15324) + (24315) 
+ (32415) + (41325) + (42135)} 
+ jsP{(21543) + (32154)} 
+ jp P{(14523) + (23514) + (24153) + (25134) + (31452) 
+ (31524) + (34125) + (41253)} 
+ ds P{(14532) + (15342) + (15423) + (25143) + (25314) + (31542) + (32514) 
+ (34215) + (41352) + (42153) + (42315) + (43125)} 
+ dy P{(24513) + (25413) + (34152) + (35124) + (35214) + (41523) + (41532) 
+ (43152)} 
= iis 
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(ix) a = P(12345) + 3P{(12354) + (12435) + (13245) + (21345)} 

+ =8,P{(12453) + (12534) + (13425) + (14235) + (23145) + (31245)} 

+ zis P{(12543) + (13452) + (14325) + (15234) + (21543) + (23415) + (32145) 
+ (32154) + (41235)} 

+ 2P{(13254) + (21354) + (21435)} 

+ 4P{(13524) + (14253) + (24135) + (31425)} 

+ ygP{(14523) + (23514) + (25134) + (31452) + (34125) + (41253)} 

+ gP{(21453) + (21534) + (23154) + (31254)} 

+ gyP{(24153) + (31524)} 

+ pg P{(13542) + (14352) + (15243) + (15324) + (24315) + (24513) + (25143) 
+ (31542) + (32415) + (32514) + (34152) + (35124) + (41325) + (41523) 
+ (42135) + (42153)} 

+ dsP{(25314) + (35142) + (41352) + (42513)} 

~ a5» 
(x) y = P(12345) + $P{(12354) + (12435) + (13245) + (21345)} 
+2P{(12453) + (12534) + (12543) + (13425) + (14235) + (14325) + (23145) 
+ (31245) + (32145)} 

+4P{(13254) + (13452) + (13524) + (13542) + (14253) + (14352) 
+ (14523) + (14532) + (15234) + (15243) + (15324) 
+ (15342) + (15423) + (15432) + (21354) + (21435) + (23415) 
+ (24135) + (24315) + (31425) + (32415) + (34125) + (34215) 
+ (41235) + (41325) + (42135) + (42315) + (43125) + (43215)} 

= or. 
When we substitute these values for the ten parameters in (10) and (12), we get the same 
result as (7) and (8). 


5. SPECIAL CASE OF NORMAL CORRELATION 


We now consider the values of the ten parameters in the case when the parent population 
is bivariate normal. For this case we aiready have, from (5) 


“sae in-l 
p= 5+~ sin“. 
Further, by comparing (2) and (6) we have 

10 


ACR Bg Oe Oe 
k= {5 +;8in p +=; (sin p)?—— (sin $p)?). (17) 


One other parameter can be evaluated exactly by the same methods used to deduce the 


above, i.e. 
a 4 Wie a eS en 2 
u= sets p—-=ain tp+ (sin— p) + (sin-! $p)?}. (18) 


The other parameters cannot be determined exactly; their evaluation involves the same 
type of difficulty as that pointed out by Kendall (1949) in evaluating the variance uf 
Spearman’s rank correlation coefficient r, exactly under normal correlation. However, 
estimates of these parameters can be obtained fairly easily, because from the results of 
the previous section we see that only samples of five need be considered. I have shown in 
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another paper (Sundrum, 1953) how a systematic method of sampling may be used in such 
cases, and, in particular, used this method to obtain the frequency distribution of rankings 
of three, four and five from a bivariate normal population with correlation p = 1/,/2. The 
theoretical values of three parameters (for the case p = 1/,/2) and the sample estimates of 
the other seven parameters, obtained from the above sample results, are as follows: 

p = 90-7500, a = 0°3470, 

k = 0-5770, b = 0-3485, 

u=0-4431, w= 0-3451, (19) 

t = 0-4543, x = 0-3363, 

1=0-4414, y= 0-3642. 
Substituting these values in (1%) and (12), we get 


-2 

oe (3) { — 0-013625n? + 0-0007n — 0-0427}, (20) 
«it 

ly = (3) {0-0012615n* — 0-04316n® + 0-59027n?— 1-9285n+1-70314}. (21) 


These formulae are not entirely satisfactory, and are given here only as an illustration of 
the method. Sampling experiments to improve these results and to obtain corresponding 
formulae for a range of values of p have been put in hand. 


However, the argument of this paper shows that the third and fourth moments of t depend 


only on ten parameters, and that these parameters may be evaluated by finding the prob- 
ability distribution of rankings of five from correlated populations. 


In conclusion, the author would like to express his indebtedness to Prof. M. G. Kendall 


for suggesting the problem and for many valuable suggestions during the preparation of 
this paper. 


[Note added in proof.) With regard to the argument of section 2 on the upper bound of the 
variance of t, it has been objected (I am indebted to Dr Hartley for pointing this out to me) that 
the transfer of units between the quantities designated P; and P, is not always possible; i.e. that it 
has not been proved that it is always possible to find a ranking in which the P;’s have the altered 
values. This objection may be met as follows: 


The problem is essentially one of maximizing a certain function of the P,’s, subject to P= >)P,, 
i 


and certain other constraints on the P,’s by virtue of their definition, and the object of showing that 
there is a ranking corresponding to the altered values of the P,’s is to show that the set of P,’s 
satisfies all the constraints at every stage of the maximization procedure. In section 2 only some of 
these constraints are taken into account during the maximization procedure. However, it is shown 
that the maximizing set of P;,’s does satisfy all the constraints because it corresponds to an actual 
ranking, i.e. the canonical ranking. 

Now, a quantity which is maximized subject to only a part of the set of all constraints must be 
greater or equal to the maximum subject to all constraints. If, however, the solution so obtained 
satisfies all constraints, then it must be equal to the maximum subject to all constraints. 
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99-9 AND 0-:1% POINTS OF THE ,? DISTRIBUTION 
By T. LEWIS, Statistical Advisory Unit, Ministry of Supply 


1. In answer to a special inquiry the lower 0-1 % points of the x? distribution have been 
computed correct to 8 decimal places for degrees of freedom n = 1(1) 30(10) 100, 120, and 
also the upper 0-1 % points for n = 40(10) 100, 120. The resulting values which it is thought 
may be of value to other statisticians are given in Tables 1 and 2 below. 


Table 1. Lower 0-1 % points of x? 















































Degrees of 2 Degrees of 2 Degrees of 2 
Py Value of 7 Saco Value of x’ Seiiien Value of x 
1 0-0000 0157 16 3-9416 2784 40 17-9164 2654 
2 0-0020 0100 17 4-4160 9272 
3 0-0242 9759 18 4-9048 4881 50 24-6739 0527 
4 0-0908 0404 19 5-4068 1602 
5 0-2102 1260 20 5-9210 4075 60 31-7383 4159 
6 0-3810 6676 21 6-4466 7656 70 39-0363 7739 
7 0-5984 9375 22 6-9829 6844 
8 0-8571 0483 23 7-5292 3977 80 46-5198 7592 
9 1-1519 4955 24 8-0848 8158 
10 1-4787 4346 25 8-6493 4363 90 54-1552 4356 
11 1-8338 5266 26 9-2221 2682 100 61-9179 3921 
12 2-2142 0932 27 9-8027 7692 
13 2-6172 1815 28 10-3908 7912 120 77-7551 4043 
14 3-0406 7252 29 10.9860 5349 
15 3-4826 8447 30 11-5879 5105 
Table 2. Upper 0-1 % points of x 
Degrees of Degrees of 
f warned Value of x? easily Value of x? 
40 73-4019 5752 80 124-8392 2402 
50 86-6608 1519 90 137-2083 5413 
60 99-6072 3307 100 149-4492 5278 
70 112-3169 3185 120 173-6174 3646 














METHOD OF CALCULATION 
2. We write n for degrees of freedom, m = $n and v= $y’. If, then, 2v is the 100 P % point 
of the x? distribution for n degrees of freedom, and r! is written for I(r +1) whether r is 


integral or not, 1 - 
ae Sie -tym-1 + — 
r al. ett™-ldt = P, 


f(v) = 0, 


or, in other words, 
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where f(v) = (14045 +...+ i=) —Pe°* [neven], (1) 
2 « ym-l 
= —e —t? v 
f(r) Jz? 0° ara wit +e Ni —Pe® [nodd]. (2) 


The exponential or quasi-exponential sum in curly Bes i.e. f(v)+ Pe’, we write as 
S,-1 and the term v™—1/(m— 1)! as T,,_,. 


3. Each required value of v(m, P) was obtained by computing f(v) directly for an approxi- 
mate value of v, v = vp say, and improving v, iteratively to v, + dv by means of the formula 











dv = A+adA?+ Bas, (3) 
: (v) ) 
where A=—— " 
DPn-1 V=Up 
3). | ‘ 
1 (T,,-1 — 27 ,-2 + T, 
on eR m—1 m—2 m—3 
B —— 6 ( , a i 


This formula for dv follows from the Taylor expansion of v as a function of the small 
quantity w=e~*f(v), thus: 
v = Up when w = ef (vp), v = v9 + dv when w = 0, hence 


dv = —e~*of (v9) (2). +} e-*o f2(v9) (i), — FEM P(X) (jm ) 0 is 


But w = e~*S,,_,—P, hence 


dw 

dv =e? (Sn—2— 'm~1) =—e* Ta-1) 
d*w 

Pr) =e~ (7, -1~ Tina)» 

a3 

i aeyett (Tin—1 a 2T n-2 + A 


whence the values of the inverse derivatives d'v/dw* and the relations (3) and (4) readily 
follow. 
OUTLINE OF PROCEDURE 


4. For the first approximation v,, use was made of the well-known formula 


Y(v/m)~1— +2 |e, 


due to Wilson & Hilferty (1931), in which the 100P % value of vis related to the corresponding 
unit normal deviate zx. The terms T, = vj/r! were then computed in succession, starting 
from the greatest term 7); the extra term 


t= [-e{” e-# dt 
7 Vv (20) 


required for odd n was obtained separately from Sheppard’s (1939) 12-decimal table of 
et=* |  ¢-# dt and ita derivatives. 
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The terms T, were then added together to give the sum S,,_, and also the last few partial 
sums...,S,,3,S,,-2. Using the National Bureau of Standards (1951) 15- and 18-decimal 
tables of e+*, e% was obtained to the requisite number of places; the difference 


F(%) = Sp—1— Pe” 


was then taken, and the correction dv computed from the relations (3) and (4). 

For the next iteration, with v = v)+év, no computation of terms T, was necessary to 
obtain the new sum S,,_,(v) + dv), this being calculated with ease from the first few terms of 
the Taylor series 


Sn—1(Vp + Ov) = S,,_1 + OVS ,,_» + 400? S,,_3 + $0v5S,, 4 +-..; (5) 


the S, on the right-hand side being the values for v = vy already recorded. In the same way 
Tm—1(Vp + 6v) was obtained from the Taylor series 


Dn—y + 80T y_2 + $60" Tg + «+ 


After this, one further iteration at most was required, the sum S,,_,(v) + dv + dv’) being 
evaluated as above by using the accumulated correction dv + dv’ in the Taylor series (5). 


ACCURACY OF METHOD 


5. In the above process, v is determined as the value for which the ‘discrepancy’ 
f(v) = 8,,_, — Pe” comes out to zero. If in fact S,,_, — Pe” differs from 0 by a small amount 
e, the error 6, v in v (so written because opposite in sign to the dv of paras. 3 and 4), is given by 


e = 0(S,,_, — Pe") = (S,,_»— Pe’) 6,0 
~ (Sm—2—Sm—1) 8,0 = —Tn_1 91%; 

i.e. 6,0~ —e/T,_1. 
To estimate e, let us suppose that the pivotal term T, is calculated to N significant figures, 
1 unit in the last place being denoted by u, so that 10~.-7 >w> 10-7. Clearly T, and 
the few terms 7, of like order (i.e. with the same number of figures before the decimal point) 
have errors of order u, the terms of next lower order (with one less figure before the decimal 
point) have errors of order 10-4w, and so on; the error in Pe’ is negligible. Hence ¢€ is at most 
of order 10-“-D77. 

When P = 0-001, T,=T7,,_;, hence | d,v | is of order 10~“-», and v is certainly correct to 
N — 2 decimal places. 

When P = 0-999, Ty|Tm—1~ 8 = e809? ~ 102, 


hence | 6,v | is of order 10-A-OT) |Tm—1, i.e. 10--%. 


Thus v is certainly correct to N — 4 decimal places. 
Accordingly the terms 7' were calculated to 12 significant figures for P = 0-999 and to 
10 significant figures for P = 0-001. 
CHECKING OF FINAL RESULTS BY DIFFERENCING 
6. It can readily be shown that x2(P), the 100P % point of x? for n degrees of freedom, 
inet the hem n+ant+b+cen+*+dn1+en-1+..., (6) 


where a, b,c, ... are functions of P. 
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For this reason, direct differencing of the results x2(0-999) against n for n = 1(1) 30 and 
again for n = 30(10) 100, and of x2(0-001) for m = 40(10) 100, was not practicable. An 
attempt to base the differencing check on the residuals 


{x2(P) — corresponding Wilson-Hilferty approximation} 


also failed. In the end it was found possible to check all the results for n > 14 by a differencing 
method based on the accurate evaluation of series (6) as far as the term in n-*, and to check 
the results for n < 14 by means of another series. Details of these series are given in paras. 7 
and 8 respectively, and the steps in the checking procedure are outlined in para. 9. 


7. Series for x? in terms of the corresponding normal deviate. If x is the 100P % normal 
deviate, so that 1 a) 1 ve 
Poh tS e~#* dt = P = —_—_ i) et tin—1 dt, 
(27) } z D(gn) J ays 
then the coefficients a,b,c, ... in (6) can be expressed as polynomials in z. Developing the 
series for convenience in terms of m = 3m instead of n, we may write 
xX? = 2m + ayx mt + (box? + b,) + (Cox? +¢,x%) m-+* + (dgx*+d,x?+d,)m+..., (7) 


where do, bo, b,,¢9, ... are constants. The values of these constants can be determined by 
equating moments about zero for the left- and right-hand sides of (7), bearing in mind that 
the rth moment of x? is 2°m(m+1)...(m+r—1) and that of z, 0 when r is odd and 
(r—1)(r—3)...3.1 when r is even. For example, if we take (7) for convenience in the form 


x? — 2m = agxmt + (byx? + b,) + O(m-+) 
and equate Ist, 2nd and 3rd moments each side, we get 
2m — 2m = by +b, +O(m-), 
4m(m + 1)—4m.2m+ 4m? = agm+O0(1), 
8m(m + 1) (m+ 2)—6m.4m(m + 1) + 12m? 2m — 8m3 = 3a3(3b, + 6,)m+O0(1), 
whence by + 6, = 0, a2 = 4, 3b)+ 6, = 4. Clearly the sign of a, is + , since y?> 2m when z> 0; 
thus series (7) begins x? = 2m + 2am + F(a?—-1)+.... 
In this manner the expansion may be taken to any required number of terms; to order 
m- it is 
x? = 2m + 2am + F(x? — 1) + A(z? — Tx) m-+* 

— ghey (3a + Tx — 16) m1 + r5},5 (925 + 25623 — 4332) m-4+.... (8) 

In particular, when P=0-001, x = +3-09023, 23062, 


and when P = 0-999, x = —3-09023, 23062; 
hence, from (8), 


X3m(0-001) = 2m + 6-18046, 46124m* + 5-69969, 04708 
+ 0-43770, 32003m-+* — 0-80105, 59174m-1 
+ 045024, 93635m-#+... (9) 
= 2m+uzm* + Ug +u_,m-*+u_,m*+u_ym-+*+..., say; (10) 


and Xin(0°999) = 2m —u, mt + uy —u_,m-* + u_,m——u_ym++ .... (11) 





~~ et ws 2 
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8. Check series for x2(P) when n is small and P nearly 1. It is clear from (1) and (2) that 
when P is near to unity, so that v/m is small, 





m m-+1 
(1—P)e? = sitet 
ym v v 
gs = ( +a taaerat i 
v =| 

hence wm = mi(1—P)er(1+—2 5 +... 
or v=[mi(1—Pypmerm(1 4 +...) 5 

m+1 


on expanding the exponential and binomial, multiplying together, and putting 2v = y?, 
this gives 


i . x? (x?)? 
xt = 2mi(1— Pym 1 + 2 4 


a (x?)* n—2 
6(n + 2) (n+ 4) (n+6)n+2 

mt (x?)4 n® — 20n? — 124n — 64 
24(n + 2)(n+4)(n+6)(n+8) (n+2)?(n+4) 

_ (x2)5 (n— 2) (n? — 136n?— 916n —704) 
120(n + 2) (n+ 4) (n+ 6) (n+ 8) (n+ 10) (n + 2)8 (n+ 4) 











+... (12) 


9. Checking procedure. The comparison between series (8) and the computed values of 
x* furnished an adequate basis for a differencing check over the higher range of values of 
n, say n> 30, but not for n between say 15 and 30. On the other hand, the algebraic com- 
putation of further polynomial coefficients in (8) would have been laborious. The following 
method was therefore adopted: 

Using equations (9), (10) and (11) and the set of paired results .y2(0-001), x2(0-999) 
available for n = 40(10) 100, the quantities 


In = $1x2(0-001) + x2(0-999)] — 2m — uy —u_,m-, 
hy = $[x2 (0-001) — x3(0-999)] — u, m* — u_,m-*—u_,m-t, 
were calculated, checked by differencing, and recomputed where necessary. Estimates 
t_», t_3, U_, of the coefficients of m-*, m-*, m-* in (9) were then obtained by fitting the seven 
quantities g,, to the relationship 
similarly, d_,,_;, @_, were obtained from the h,,. 
The values of x2(0-999) for n = 14(1) 30 were now checked by differencing the residuals 
x? — (2m — uz mt + Uy — ... —U_ym* + t_.m-*—4&_,ym*+...—ad_ym-). 


For n = 1(1) 14 the values of x2(0-999) were checked from equation (12) by evaluating 
for each given value of x? the discrepancy betwéen the two sides of the equation and 
differencing the quantities so obtained. 
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STANDARD APPROXIMATIONS TO x? WHEN n IS LARGE 


10. It is of interest to compare the two most familiar approximations for y? with the 
exact series obtained above. 


I. ./(2x?) distributed approximately normally with mean /(2n—1) and variance 1 
(Fisher). 
This gives x? = n+x./2nt + }(2?—- 1) + O(n). 


Il. $(x?/n) distributed approximately normally with mean 1 — 2/(9n) and variance 2/(9n) 
(Wilson-Hilferty). 
OF on ip atedetitid) ae (2° — 62) n-# + O(n-). 

III. Exact series (equation (8)): 


x? = n+x/2nt + (2? —- n+ (a3 — 7x) n-*# + O(n-). 


I wish to thank Mr E. C. Fieller for his advice on method, and my colleagues in the 
Statistical Advisory Unit, Ministry of Supply, for their collaboration in the computing work. 
I also wish to make acknowledgement to the Chief Scientist, Ministry of Supply, for per- 
mission to publish this paper. 
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TABLES OF SYMMETRIC FUNCTIONS. PART IV 
By F. N. DAVID ann M. G. KENDALL 


1. In recent publications (David & Kendall, 1949, 1951) we have given, for functions up 
to and including weight 12, tables of the monomial symmetric functions (M) in terms of 
the one-part functions (S) and vice versa, tables of the unitary functions (U) in terms of 
M and vice versa, and tables of the homogeneous product sums (#) in terms of U. Pursuing 
our self-imposed task of compiling a definitive set of tables of symmetric functions we here 
present the MH-HM tables. Ina further publication we shall complete the set by presenting 
the US-SU tables, the final tables HS-SH being equivalent to the US-SU tables with 
suitable changes of sign. 


2. Many methods can be used for compiling and checking the present tables. One of the 
simplest is to start from the relationship 
n! 


14, = 
_— 27m Os... kem,| m4)... 





8] 832... SEK. 


The S-functions can be turned into monomial symmetric functions using Part I of our 
tables and the result follows. Conversely, we have 


(7, +7,+...+7,—1)!n 


$= — ] \1t79+---+74+1) 
n= 2(—1) 7,! 1g! ... 7! 





hE hgs... hie. 


We take the monomial symmetric function and express it in terms of the S-functions 
from Part I tables. Substitution for s in terms of A gives the required result. The whole of 
the tables HM was checked by using MacMahon’s D operator to build up each table 
successively. The converse tables MH were checked in the same way, first by using the 
operator and then by solving the simple equations of relationship which result. For the MH 
tables we have the additional check that the sum of the coefficients is zero. 


3. In the introduction to Parts II and III we mentioned the uses of the U-functions in 
problems of distribution. The H-functions can be used in an analogous way and we quote 
therefore only one example. A distribution theorem due to MacMahon can be stated as 
follows: 

‘The number of ways of distributing objects of specification (p,p,... p,) into boxes of 
specification (1), no box being empty, is equal to the coefficient of 


wPrtPat--+Pr( 9) De... Dr) 
in the development of the function 
(hyx+ hax? +...)™.’ 
When we come to use this theorem it is found that the easiest method of attack is the 


expression of the powers and products of the H-functions in terms of the M-functions. This 
is one of the simplest theorems in which the tables will be found of use. 


4. In order to save space, when the tables have been set out use has been made of Meyer 
Hirsch’s law of symmetry which states that the coefficient of the monomial symmetric 
function (7,7, ... 7,) in the expansion of /;, hy, ... h,,, is equal to the coefficient of the monomial 
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symmetric function (/,/, .../,,) in the expansion of h, h,, ... h,,. This enables each table to be 
written as a triangle and the only difficulty in putting the two parts together has been at the 
diagonal. In order to express the M-functionsin terms of the H-functions weread horizontally 
until the diagonal in bold type is reached and tken continue vertically downwards to the 
edge of the table. Thus for w = 5 we have 


(312) = 3A5— 13hgh3 + 10AZh, + 12hgh? — Shgh,—Dhgh, + Shs. 
In order to express H-functions in terms of the M-functions we read vertically until the 
diagonal in bold type is reached and then continue horizontally to the right to the edge of 
the table. Thus, again for w = 5, 
hgh? = 20(15) + 13(213) + 8(271) + 7(31?) + 4(32) + 3(41) + (5). 
The two diagonal figures are separated by a bold horizontal rule. 
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Table 4.2 Table 4.3 
| wma hd Is w-3| bo kh he 
a) | 2 * ay} —S 3 } 
I 2 I 
(2) 33 (at) | _2 es 
ee S 
Table 4.4 Table 4.5 
w=4)| hy hgh; hy® yh, hy w=5 | hy® hgh? hyth, hgh;® hgh, hghy hy 
a + 12 6 4 I eo) = 60 30 20 10 5 1 
(21*) a = 4 3 > (21°) ay = 18 R 7 4 I 
o 2 1x be ery. 
i ae 4 E (*)| 3 -14 “4 
I ae See ee 
(31) 2 =F 2 “9 3 (31") 3 -@ 10 m2 
_— ner 2 I 
(4) -1 42 -4 4 (Ga) —-2 10-11 -8 &« 
2 1 
ved f. oe oe a ee le : 
wi 2s ug: gf: 2p =e SS 
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be 
he 
Table 4.6 
lly 
he w=6 | Ay Phys hythy® — hy® igh? Iighgh, = Iyghy® = ag® ~—ightg igh, ig 
a 720 360 180 go 120 60 30 20 1s 6 I 
‘ 192 102 54 72 38 21 14 11 5 I 
h @r| _, a 
e (2%1*) 58 33 42 24 14 10 8 ee 
of 6 -33 “6 
| (2) ar 24 1s 9 7 6 3 I 
—1 6 —10 4 ‘ 
3 34 19 13 7 4 I 
Gr) 4 ar 26 -4 19 8 ‘ 
12 5 3 r | 
(321) -6 34 «(48 8 -30 61 
2 ue 4 - i 
(4r") -3 16 20 4 Is 21 15 
s 4 3 a 5% 
(3") I -6 9 -1I 6 —15 -3 6 
3 2 I 
(42) 2 —12 19 -6 10 —20 —10 3 14 
2 1 
(st) 2 -11 14 -2 11 —17 -11 3 6 Ir 
I 
@) _, 6 -9 2 -6 12 6 -3 -6 -6 "6 
Table 4.7 
w=7 | Ay” Aghy® hgthy® hyehy hyhy* hghghy* hyhy® hsthy hyhy® hghghy Igy hgh? Iighy Ih, In 
(1?) 5040 2520 1260 630 840 420 210 140 210+ 105 35 42 21 7 I 
I 
(215) 1320 690 360 480 250 = 130 90 3=««135 70° 25 31 16 6 z 
—6 37 
(2*x*) 78 207 +270 = 148 81 58 84 46 18 22 12 5 I 
10 -64 117 
(2*x) 120 150 87 $1 37 si 30 13 15 9 4 I 
—4 27 —54 30 
(31°) 208 114 62 46 73 39 15 21 11 5 I 
Ss ~=g8 53 —21 28 
(j21°) aT D2. 4 Ue 14 8 ee 
—12 78 -—143 62 -70 195 
(32) 25 19 25 16 8 9 6 3 I 
3 -a!I 44 —26 17 —54 
(3") 16 20 13 7 8 5 3 I 
3 —-20 37 —-14 20 —-60 13 25 
(41°) 3419 8 8613 7 4 1! 
—- 0a, 58 18 —23 3 -g -%: SB 
(421) 12 6 8 5 3 I 
6 -40 76 -—36 36 —101 30 29 —33 64 
(43) —t : 1 
-2 144 —28 13-14 45 -13 —-I19 Ir —26 19 
2 7 4 3 I 
(s1") 3 -I19 33-14 18 —43 12 10 —18 25 -7 18 
(s2) — , , 
-2 14 —-29 17 —12 36 -17 -7 12 —24 7 -12 17 
(61) Pest I 
—2 33 «=~ sae 33 -7 —t0 13 -20 7 -%3 7 13 
I 
” I = 4 “7 7. ‘ae 7 te 4 =F . =. Po 7 
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Table 4.8 
w=8(i)| h,° Iah® heh  hy*h,? hat hshyS —Iighghy® Iyghyth, hy*h;* —Iaa* hg 
a) ae 20160 10080 5040 2520 6720 3360 1680 1120 560 
(21°) 10440 5400 279° 1440 3720 1920 990 680 35° 
at 50 
(2*r4) 2892 1548 828 2040 1092 584 412 220 
15 -II0 251 
i 861 480 I1I0 618 345 248 139 
(2°1*) 
-10 76 — 183 146 
(24) 282 600 348 204 148 88 
I -8 21 —20 5 
Gr) 1520 820 440 320 170 
6 —43 04 — 63 6 39 
(321°) 463 260 194 108 
—20 148 -—338 240 —24 —134 ~ 489 
(32"1) 154 116 69 
12 -—93 228 —182 20 81 —322 258 
(3*1*) 92 54 
6 —45 103 —70 6 44 — 165 102 66 
(32) —35 
=3 24 —61 50 = —22 94 = —33 35 
(41) =<§ 3 = 54 = 25 ur “ —35 16 
(421°) 12 —9go 209 —15 20 2 —2 192 99 —48 
(42*) —3 24 —62 5 —12 —20 ) —64 —23 14 
(431) -6 4 — 108 76 —— —46 176 —107 -74 31 
(4°) I -8 20 —16 3 8 —32 20 14 -4 
(51°) 4 —29 64 = 4 27 —-91 58 27 —16 
(521) -6 46 -II0 4 -8 ~42 ™59 —II9 —49 37 
(53) 2 —16 40 —31 2 16 —67 54 27 — 23 
(61*) -3 22 —49 34 = —2i 71 —4 —22 Ir 
(62) 2 —16 41 —36 14 —56 15 -1 
(71) 2 ag 34 —23 2 I =— 31 19 - 
(8) -1I 8 —20 16 -2 - 32 —24 —12 8 
w =8 (ii) | huh Ialgh* yh = ahh, hé high? = Iighgh, agg heh,* hehs hah, he 
(1°) 1680 840 420 280 7° 336 168 56 56 28 8 1 
(21°) 1020 $25 270 185 50 228 117 41 43 22 Y I 
fait 606 32 173 122 36 150 80 30 32 17 6 I 
2°1*) 354 19) III 80 26 96 5. 22 23 13 5 1 
(2*) 204 120 72 52 19 60 3 16 16 to 4 I 
(315) 500 265 140 10° 30 136 71 26 31 16 6 I 
(321°) 286 160 9 66 22 5 47 19 22 12 5 I 
(32"1) 162 96 57 43 16 52 31 14 15 g 4 I 
371") 126 76 45 36 14 44 26 12 14 4 1 
(3*2) 70 45 29 23 10 2 17 9 9 6 3 I 
(41°) 209 115 63 47 16 73 39 15 21 11 5 I 
3r 68 
8 40 31 12 43 25 11 14 8 4 I 
(421°) -76 211 ™ 
9 26 20 25 I 8 9 6 3 I 
(42") “(a ‘ , 
17 20 13 7 5 3 qT 
(431) 40 =—I2I 32 
(4 qi —s 8 6 4 4 3 2 I 
~ 28 —12 —24 10 s 
s 34 19 13 7 4 I 
(st) — 26 60 —14 —28 4 26 
12 6 8 5 3 1 
(sa1) 39 — 105 28 54 -8 -39 
a. 3 2 i 
(s3) —13 39 =§ —31 4 13 =o 23 
a 7 + 3 I 
(61*) onl 
21 —50 14 23 —4 —21 29 -8 21 
~ ee i 
(62) -14 42 —20 —16 4 14 —28 8 -—14 20 
2 I 
(71) —15 38 -8 —23 4 15 —23 8 —15 8 15 
(8) ~ 
Ss =—% 8 16 -4 -8 16 -8 8 -8 -8 “3 























Table 4.9 



































2=0G 
9 Gi) h! hab,’ hath’ 
h, heh? hath, me 
1 a 
(1%) (362880 181 Jy hyhigh,* high? 
449 90720 Iighgth;* —high,? 
(at’) x 45360 22680 hy? igh, hy? 
-8 red 47880 60480 30240 hah, hy?  yh;® 
(2*15) 5 24570 12600 15120 7560 hy 
32760 560 1008 
2 an See ae 13320 16800 861 jo 5040168 
(2°1°) 4 4 7o20 17640 © 4410. 5880 oO «15120 
(21) — 170-48 7227 «3924 9300 4900 258 jo1o «1050—s88. 
i 4 $17 9450 5130 ‘ 580 3420 1800 = 
(31° =49 2202 a7e7 660 
) ssn. «tg 504028 1515 1980 5070 
Gar" 7-87 55 20 1584 1077 pe 
321") 152 —146 I ; 894 1140 2880 
-30 2 36 40 6840 (36 644 
(32*1*) 50 | —687 52 3° 8=—- 1920 Bae 264 1620 
30 —258 686 -177. —228 3764 206. 1370 51 
(32°) 5 41 - 28 1039 4 4128 1 © 4020 
am 789 222 : 508 822 
(371°) 36 =p8t 23 1x1 ae 666 324 2250 
134 se. aw 868 
“ 10 -8 —50 x 493 20 
(3221) 4 ae 30 3. 208 306 496 7 1250 
? -12 10 54 8 50 295 9-132 
(3°) 5 —306 1-374 690 
I ms 308-82 - 392 —48 — 378 oh 
(415) 9 eae oo | 4k =e 149 2 950 
(421°) = 6 ‘ 7 78-1 __227. 105 
= 22 - S “8% stig: “ 
31 is - 2 
432 “2 5 p84 132 ae ees : eit aa, 
(4*1) 3 ~ 284 i ee «4 ae ae ” 10 * 
oe ae - He -1n aap 
(s1*) 74 5 “sey st —536 112 (2 3 - 1545 
( —76 403 - or 49 —-3 6 
521°) Ps 5 —41 22 50 —256 477 26 154 04 2 42 
| -4 3S eg [at.235 5 “# 
31 —27 2 27 el - —18 
(om an 87 207 — 82 1. <sh . 2 oe 9t 
, ae ee oe a 
- a 3 ‘ah > pe 
(621) : = - " 4 36 23-1 479 4 56 90 3 2 
ani 3 . i oe | . oe oe , ; 3 6 -36 
rh 3 18 fess 94 —160 —22 —31 90 Me .: —34 35 ar —18 33 
(83) Pe ée 59 7. 48 135 139 I = sas ae 
1) aS I = ~67 -15 —-18 — 229 139 I =2 15 a 
(9) Fos) ae is aoe i a 43 
=6 47 oe gah oe —105 wa | 15 79-127 -6 r 
22 34°> ae 28 
w =9 (ii) \ghtg hy? 9 a —78 22 —25 —42 = 15 
1° highs*hy hgh 54 +o - wae bs = —24 
1 Aghgh,® hghsh t 9 18 _35 <= 16 
(1) 2 heh, hh, hy 27 3 17 
(217 7560 378 n* ighghy® high,’ i 
(2? 2 4515 3780 2520 hy" hghgh, hy 
fats} ae ts | ae tato 630 1 Iighy Iighy? ghgh 
fa" = SG $13 she see tame | oe ae A 
1*) | arts 516 “ 330 = 2. GS 504. 1 ah, hgh 
I 3 6 4 26 1 h 
(Cats ae | 310 ol 212 re 738 an 333 = a = as2 8% p 
ix) | 719 4 Teg 40 830 444 252 219 156 a - » - 
(3?r*) i ae ee we a aa a oe a SS i ii ae 
(321) ac oe ¢ .; ae oe = = 35 10 2 8 I 
oo cae LS i > @ a tm 183 132 56 229 > | 2s : ae 
4r5 135 2 35 86 = — = - = 151 :~ 42 4 14 : I 
6 I 
(421*) 479 455 oan 45 a v7 = 97 58 22 97 55 31 32 22 7 I 
479, 272 ; a ws 7 5 63 74 26 . = 2 3 = 6 : 
aa" 520 2 36 110.501 ak 33 49 19 86 8 17 od 13 . I 
42") 163 74 «287 = 141 27 12 53 32 20 rot 10 r| I 
—344 aS coe = a om) = - 
(431° =ae =: 67 136 10 5 9 5 I 
, 163 23 5 71 26 9 6 4 I 
-344 101 ~~ 97 58 47 r 31 16 3 1 
(432) aay Se 42 «127 ~~ (ae 7 oe 6 I 
184 77,46 SP Sees 5 I 
(4°) -154 —131 232 27 37 I 14 15 
: 112 71 46 . oe 3 
(52) 04 —69 aie a1 m 30 24 - 12 14 P 1 
38 32 a 4 
119 -7 39 21 8 7 9 1 
(521*) = 6 209 : 9 20 , 6 
—317 3 18 115 63 13 3 1 
(52") 209 200 35 47 6 7 8 
94 7 8 68 4 i 73 39 3 I 
s a ° 
(531) 80-54 4 4 228 4 Ow ... >on 
175 - . 19 26 3 25 5 I 
(54) oe ar -68 “38 ” 9 sree 
-76 74 42 3s 6 8 , oe 
(6x* a. 42 —123 ». See 9 
) vs , —@ = 28 97 20 1 e 3 
6 99 = 6r 29-14 ee Ae : 
on i as a ie 5 8 5 
166 — -13 - 38 29 6 3 : 
120 a 30 4 
(63) 95 69 - 4 
2 68 16 —32 34 =~! ’ z 
60 22 45 9 9 8 I 
(71*) 45 4s. —36 121 ee 30 13 7 . 
= -9 - 2 -18 - = 6 2 
(72) 49-50 3% 45 = 4s 88 8 5 
—64 25 13 , 9 —36 ‘ P 3 7 
(81) 55 4 =- +. “i 6 15 —36 a 4 3 
—60 32 -9 ‘ 1 26 = 27 2 @ 
= = 
) ee ae — a > ae 
18 gs —8 9 3 
36-27 Pa. eas 24 I 
—27 43 - —32 3 
18 9 -26 9 -16 _ 2 
9 9 6. 23 ‘ 
-2a7 9 ¥ =2# 
os =» e —t7 — 1 
“=F 18 * 9 17 
oo -9 I 
a 
9 
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w = 10 (i) 


hahah,’ 








(a) 


(ar*) 


(2*1*) 
(2*1*) 
(a*1*) 
(2') 
31") 
(321°) 
(32*1*) 





Table 4.10 
hh, hyth;* 
453600 226800 
241920 123480 
129060 67320 
68868 36756 
1476 
pe. 1 
—708 371 
56 ——s 
—290 124 
1628 —720 
~— 2555 1194 
S82 —512 
—578 4 
1296 — 582 
—315 166 
~177 74 
2 - —112 
-i1 3 
1338 88 
—218 138 
79° 7 
— 1106 528 
a 
- 11 
156 -8 
—221 96 
812 —372 
—572 292 
— 489 214 
310 — 160 
260 —120 
—5° 25 
186 —81 
— 506 236 
162 —94 
258 —112 
— 100 51 
151 66 
270 — 128 
—99 46 
= —52 
<3 59 
-80 34 
5° —25 

















F. N. Davin anv M. G. KENDALL 433 

















Table 4.10 (continued) 
w=10 (ii) | hhh, hth = hy*hyh;* —h,*h,* ha*h, hgh, Kighgh,* —aghy*h,* hgh, highgh,? 
(1**) 75600 100800 5 25200 16800 eb og 75600 37800 18900 25200 
(21°) 42840 57120 29120 I “4 10080 43680 22260 11340 15120 
2*1° 24300 32280 16820 7 6060 25050 13050 6795 9030 
2'1* 13800 18180 9708 518. 3648 26730 7623 4071 53 re 
2‘*1* 7848 10200 5596 307 2196 —_ 444° 2442 31 
(2° 7° 5700 3220 1830 1320 100 4560 25 1470 1860 
(31" 17850 24080 12600 6580 4620 36960 19320 10080 5250 7070 
(321° 101 13560 7280 3 27 20340 10930 5860 3135 4195 
(32*1* 5783 7600 4200 2318 1686 11130 154 33 1875 2472 
(32"1) 4240 2418 1382 1017 3450 198 1125 1447 
774 
G4 5672 3156 1748 1296 8300 4650 2590 1435 1932 
— 363 299 
(3%21") _1818 1044 786 4490 2594 1495 859 1126 
. go2 — 637 1527 : 
(3%2") 626 474 2420 1442 862 517 652 
3 —282 123 —330 120 
3 1770 1074 651 393 504 
(3"1) —122 I - 
04 273 48 
(41 13290 7155 3840 2055 2795 
4 151 —116 228 —45 -30 55 
(421") 3985, 2212 1224 1645 
—744 529 —Tr09g0 227 148 — 240 1091 
(42"1*) 12 732 960 
42 910 —551 1236 —282 -174 243 — 1166 I 
(423) I 556 
“ — 160 69 + =—164 42 20 —34 178 — 248 3 
(431°) th) 
502 —410 82 — 162 —122 168 -77 814 —112 "622 
(4321) am oe = 1248 276 210 — 198 97 ~eE 184 —752 
%) 116 —83 228 —48 —55 24 - oe I —20 104 
(4*1* —156 129 — 240 45 30 se 2 — 284 47 — 224 
2 112 — 66 152 —29 —22 28 — 148 200 — 54 112 
(s1° —135 99 — 200 r+ 26 - 206 — 205 26 —140 
521° —355 755 — 182 a 163 —735 774 — 104 514 
521 — 446 217 —510 162 -99 476 —554 84 —323 
31° —332 259 —543 120 78 — 103 477 — 49 5 —389 
(S32 ) —130 331 —122 —4!1 50 —— 2 —36 189 
SAY 181 —140 277 —61 —35 5 ji 299 —40 245 
(s* —45 25 —-55 20 5 —I0 50 —55 5 —45 
61° 119 —84 17 -33 —26 41 —176 180 —26 11s 
(621* —328 221 - 102 728 — 100 459 —512 86 —307 
62* 128 —§2 128 —40 -16 —132 176 - 48 
(631 169 —144 336 —57 —65 52 —249 279 - 201 
—72 $2 —120 21 22 —20 104 —132 30 —88 
(71° —%4 69 —142 30 20 —34 146 —147 18 = 
(721 193 —115 277 - —4 $I — 239 273 —36 1s 
73, —52 $3 —144 3 3r 737 H — 102 10 - 
(81 —55 107 —23 -13 = —-11 ei -18 83 
82 -B 37 —92 31 10 ~- ge —11 26 - 
91 —49 43 —57 15 13 —I9 6 —87 10 - 
10) 4° —25 60 —15 —10 10 —50 60 —10 40° 
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Table 4.10 (continued) 





















































w = 10 (iii) hghghgh, highs? heh," hihs hgh,® Ighgh,* hgh,*h, hghgh,* hghghs Ighh, hs 
(1'9) 12600 4200 6300 3150 30240 = 15120 7560 5040 2520 1260 252 
(21°) 7700 2660 3990 2030 §©618480 9408 4788 3248 1652 854 182 
2*1°) 4700 1690 2520 - III00 5778 3006 2078 1080 57 132 
2*1*) 2865 1077 1584 46 6570 3510 1875 — 705 390 96 
2*1*) 17 68 990 547 3840 2112 1164 32 460 262 7o 

(25) 10 440 615 355 2220 1260 720 520 300 175 51 
(317) 3675 1330 2030 1050 9 4788 2478 1722 88 483 112 
(321°) 2240 50 1280 680 5380 2872 1530 1089 57 327 82 
(32*1*) 1303 545 800 ° 3100 1710 942 68. 376 220 
(32*1) 82 350 = 285 1770 Io1r 579 42 245 147 44 
(3*1*) 1066 432 52 356 24! — 754 560 306 7186 52 
(3'21*) 647 279 402 230 1390 02 I 348 199 124 38 
(3*2*) 392 180 246 149 780 468 282 214 130 82 28 
3°*1) 306 145 I 120 366 222 174 105 24 
(41°) 1490 570 890 470 4050 2139 1128 803 422 242 62 
(421°) 907 367 560 306 2275 1249 684 501 273 164 46 
(42*1*) 551 237 348 199 1270 726 415 310 177 110 34 
(42°) 334 153 21 130 705 420 252 190 115 73 25 
(431°) 431 189 28 162 970 5 325 252 143 30 
I 123 174 105 535 325 197 154 93 62 ee 
(4321) 357 
P 65 84 55 220 143 93 75 49 34 14 
FF Ton. 
P 126 75 350 218 134 IIo 66 48 18 
(4*1*) 249 pa 
(4*2) 49 190 124 81 66 43 31 13 
— 180 22 —49 54 
(51°) I 841 456 336 181 111 32 
166 —20 44 —20 
(521°) 480 273 207 117 75 24 
— 648 80 — 168 80 —155 561 
(52"2) 164 126 76 so 18 
465-57 109 — 58 95 -379 311 
(531") 102 61 43 16 
470 —63 136 —58 95 358 232 8 
2 12 
(s32) | _ S a 
295 41 59 28 —50 215 —193 —154 157 
(541) 22 Io 
54 — 280 32 ~108 44 —50 201 —141 —172 90 143 ‘ 
2 
(s*) 55 ~g 20 ~-§ 10 —45 4° —35 “35 15 
(61°) —146 20 —34 20 —40 130 =—% a 3 wed 
(621*) 431 —63 95 — 66 96 —339 215 205 —105 — 102 15 
{2°} —128 I 31 30 —2 9 —80 —50 3 32 —5§ 
(631 —291 53 — 63 44 =a 177 — 102 141 61 73 —10 
(64) 132 —22 39 —34 16 — 64 42 54 —20 — 44 5 
(71°) 122 -17 28 = { 34 —112 69 63 —35 —29 5 
(721) | -—247 41 —43 2 —Ss1 188 8-136 —107 77 49 —10 
EH 123 —31 I —10 17 ~ 51 51 —41 —20 5 
(81) —94 Io =~ I —27 QI —55 — 56 2 29 “se 
(82) 92 —10 19 -1 18 —-72 62 38 —36 —20 7 
(91) 78 —10 24 —10 19 — 67 39 —20 —29 5 
(10) — 60 10 —1§ 10 —10 40 —30 —30 20 20 -5 
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Table 4.10 (continued) 











| 
w=10(iv)| hight Aeghgh;* heh* hehgh,  hehy hzh? hyhgh, hike h,h,* highs hyh, hyo 
(17°) 5040 2520 #1260 8 210 720 360 120 go 45 10 1 
21°) 3360 =. 1708 868 588 154 528 268 92 7 37 2 f 
(32*s% 2190 1138 591 113 378 196 Jo 5 30 I 
(231) 1398 747 399 281 83 264 141 53 45 24 7 I 
(2*1*) 76 4 26 192 61 180 100 40° 34 19 6 I 
(2°) 540 310 180 130 45 120 70 30 25 15 5 I 
(31”) 1960 ©= 1008 518 357 98 358 183 6. 57 29 8 I 
(321°) | 1230 652 345 244 72 247 130 44 2 7 I 
(321°) 758 417 229 166 53 166 QI 36 33 I 6 1 
(32°1) é 264 152 112 39 109 63 27 24 14 5 I 
Gn 6 354 193 142 46 152 82 32 32 17 6 I 
(3?21? 386 222 127 96 34 98 56 2. 23 13 5 I 
(372*) 228 138 84 64 25 62 38 I 16 10 4 I 
(3°) 186 114 69 55 22 54 33 16 15 9 4 I 
(41° 1045 544 283 198 57 229 118 42 43 22 7 I 
(421°) 627 340 184 133 42 15! 81 31 32 17 6 I 
(4271*) 371 211 120 9 31 97 55 23 23 13 5 I 
(42°) 217 130 7 59 23 61 3 17 16 10 4 I 
(431°) 303 173 9 75 27 86 4 20 22 12 5 I 
(4321) 175 106 64 50 20 53 32 15 15 9 4 I 
(43) 7 52 34 28 13 27 18 10 9 6 3 I 
(471°) 12 78 47 38 16 44 26 12 14 8 4 I 
4°2) 72 47 31 25 12 2 17 9 9 6 3 I 
(51°) 501 266 141 101 31 136 71 26 31 16 6 I 
(521°) 287 161 67 23 5 47 19 22 12 5 I H 
(521) 163 97 58 44 17 52 31 14 15 2 4 I 
(531°) 127 77 46 37 15 4 26 12 14 4 1 
(532) 71 49 30 | Ir 2 17 9 2 6 3 1 
(541) 48 32 21 I 9 20 13 7 5 3 1 
(5*) 16 12 9 8 5 8 6 4 4 3 2 I 
209 115 63 47 16 73 39 15 21 Ir 5 1 
(61°) — 
(621") 63 40 31 12 43 25 11 14 8 4 1 
—96 261 
o 26 20 25 16 8 9 6 3 1 
(62*) 
24 78 44 
17 8 20 13 7 8 5 3 I 
(631) 8 
48 —141 32 112 
5 8 6 4 4 3 2 I 
(64) | _\6 by a —_ 
54 22 44 34 
(71°) 34 19 8 13 7 4 I 
—34 78 -18 —36 10 34 
(721) 12 6 8 5 3 I 
ann SI —137 36 7° —20 —§I 100 
at © ee, ee 
(73) -17 51 -10 —4!I 10 17 4! 31 
® a , 3 : 
(81") 27 —64 18 29 -10 —27 37 —10 27 
oy). 
(82) -18 54 —26 —20 10 8 —36 10 —18 26 
(01) = 
i —19 @ -t0e -% 10 "9 29 Io —19 10 19 
(10) ee as po ve = Wy uM — 
30 10 20 10 10 20 10 10 10 10 10 




















436 


Tables of symmetric functions. Part IV 

















Table 4.11 
wari) | Ay" igh? = Iag®hyx? = Ig*g® Igy? Ig’ ighg® —Iighghs* ight! gh*h,* 
(1) 39916800 19958400 9979200 4989600 2494800 1247400 6652800 3326400 1663200 831600 
I 
(21") 10160640 5171040 2630880 1338120 680400 3507840 1784160 907200 461160 
—10 101 
(2"1”) 2678760 1387260 718200 371700 ©1844640 955080 494340 255780 
36 — 368 1361 
(2*1°) 731520 385740 203400 967680 510300 269100 141900 
—56 581 —2190 14 
(219) 207324 IIIsio 506520 272160 146340 78744 
35 -3.2 1430 — 2444 1742 
(2') 61260 264600 144900 79500 43710 
" pe Ti pe iit 5307680 —Ba08o 6160 185640 
I 2 3561 185 
r® 
(G1) 9 —9gr 331 —519 325 9S 84 
(321°) 193740 102990 
—56 574 —2123 3398 —2182 380 — 530 3411 
(32"t*) 105284 57156 
105 — 1095 4140 — 6818 4544 —828 1005 — 6634 13359 
3,2 1731 
(32°*) je 640 — 2494 4284 —3035 602 —576 3926 —8298 ~~ 
(32% 5 —55 224 —412 32 —85 47 332 7. 562 
G 1° ar —216 799 — 1270 7 —132 206 — 1332 25 — 1488 
3*21° — 60 630 — 2392 3929 — 2566 438 — 598 3994 — 8085 4926 
3°21 30 =—3 1278 —2214 1557 — 282 300 — 2091 452 — 3095 
(g*1* 10 —-1 4°5 — 663 420 — 66 105 —714 145  - 
3°2) = 44 -177 312 —219 34 —42 303 7 ¢ 7 
(41 4 I —205 464 -3 $2 -75 471 —88 510 
(421° 2 — 432 1604 — 2584 I —312 400 m4 f) — 2961 
Ga - 30 — 2402 —274 —578 3824 —7717 45 
42°1 20 —216 56 —1510 112 -2 192 —1319 2820 —1954 
(431° —30 310 —11§2 1843 —1178 207 — 298 1925 —3713 2151 
(4321° 60 — 636 2442 — 4074 2742 — 516 606 — 4074 296 — 5118 
432" —12 132 — 534 3°3 —734 170 —120 849 — 1879 1352 
43° 12 129 — 501 37 —547 94 —129 891 -1 1103 
(471? Io — 104 389 — 628 II -7 101 —652 1250 —72 
(421 —12 129 — 504 862 - 130 —123 836 —1722 108: 
ty 3 29 132 —230 162 -34 33 —234 $02 316 
SI 7 ~7 259 <= 258 —44 66 —415 787 —458 
(sar%) —30 310 — 1157 1875 — 1229 219 — 288 1860 —3647 2220 
(s52*1 30 —318 1227 — 2079 1449 —282 291 — 1950 4027 — 2645 
(s2* -4 ait ad 329 — 264 7° —38 266 — 589 442 
(5317 20 - 77 — 1252 800 —132 202 —1316 2571 — 1526 
(5321 —24 258 — 1008 1718 —1178 204 —246 1690 — 3568 2354 
53° 3 —33 132 —229 15S —22 33 —237 52 —355 
(541" —12 126 —476 776 —Sir —124 810 — 157 3 
2 6 - 266 —476 360 —84 62 — 434 940 a 
Gh 3 —32 123 —204 135 —22 32 —214 43° —270 
1 * 61 — 223 352 —223 39 “ae 359 — 652 393 
{ga 20 — 208 782 —1278 846 —156 194 — 1263 2500 —15§22 
62"1 -12 129 — 507 881 — 636 130 —117 799 — 1699 1164 
(631° —12 126 —476 773 — 49 86 —124 819 — 161 949 
32 6 — 66 266 —47. 34 — 68 62 — 440 78 —JO4 
re 6 —64 a6 - 272 —52 64 — 428 52 — 505 
(65 -2 22 — 88 15. —110 21 —22 154 —330 225 
(71° 5 =e 187 —2 188 —33 —30 577 —333 
ba —12 126 — 479 793 — 533 102 —118 7 — 1563 g61 
72 a> 13 — 245 194 —5°0 29 — 203 — 322 
(731 3 —64 é — 406 262 —44 64 — 434 a; —§20 
4 -2 22 — 83 154 111 2 —22 154 —32 208 
(81 -4 41 —151 240 —153 2 -39 247 472 277 
(821) 6 — 64 = — 420 288 —52 60 — 405 842 — 545 
(83 -32 22 - 153 — 105 17 —22 157 —345 230 
(91 3 -31 115 — 184 119 -—22 30 —191 3es —213 
92 -2 22 —89 161 —124 29 —20 140 —309 227 
(10,1 2 ar -79 127 —80 13 —21 137 —265 150 
(11 1 ~It 4 -77 55 —11 11 -77 165 —Ito 
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Table 4.11 (continued) 














w=11 (ii) Ish,’ hi h,' Avhh? —hy*hy*h, hh? hy* hy hgh,’ Iighgh;® = Iighy*h® —s ag hy’ hy, 
(1") 415800 1108800 554400 277200 184800 9 1663200 831600 415800 207900 
(21°) 234360 614880 312480 158760 107520 54! 922320 720 238140 120960 
(2*17) 132300 340200 175980 Q1000 62580 32340 507780 262710 135870 70245 
(2*1°) 74820 187740 52200 36420 19200 277830 146520 77265 
aint 42402 103320 55620 29964 21180 11424 151200 1360 43812 23610 
(251 56700 31200 17210 12300 6810 1900 45000 24780 13680 
(31°) 90600 249760 129920 67480 47040 24360 381360 198 102900 53340 
(321°) 54660 137720 73100 38740 27420 I 207060 109950 58290 30855 
(32?1*) 30996 75700 41068 22256 15972 640 111930 60750 32939 17841 
(32*1*) 17622 41480 23032 1279. 9288 5163 60270 33450 18573 10317 
2") 10050 22660 12892 735 5388 3090 32340 18360 10452 5970 
Ss 
(315) 55480 30340 16540 12060 6540 82180 45110 24670 13445 
117 546 
(3*219) 17010 9516 7026 3918 44030 24740 13873 7762 
—402 — 1638 5235 
(3%2*t) 5478 4080 2352 23520 13530 7788 4485 
272 828 — 2872 1923 : 
(331") 3098 1792 17010 9930 $787 3363 
62 315 — 1043 540 245 
(3°2) 1081 9030 5400 3237 1944 
—34 —126 472 —357 —98 Ss 
(41) 129990 69405 36960 19635 
—44 — 182 517 —255 —87 33 70 
(421°) 38010 20760 11310 
270 999 — 2934 1503 504 — 198 —373 2044 
(421?) 11642 6519 
486 —1478 4561 —2483 -795 331 539 -3063 ~ 4843 ae 
3 
(42°) - 489 — 1598 974 262 - ~ — 183 1092 — 1870 
(431° — 183 ~% 2318 — 1145 —428 15 277 — 1532 2278 
(4321° 476 1676 — 5305 2820 1041 = — $52 31 — 5096 1896 
(432* —170 —327 1126 —732 -1 102 111 — 682 1208 — 560 
(4371 - 90 — 400 1341 —675 —327 111 111 — 662 1085 —380 
(471° 68 271 —761 365 131 “# — 100 560 —839 300 
(4721 —118 -3 1076 —552 — 204 120 —716 1192 — 488 
(4°3 3 1 — 364 180 92 sat | —30 188 —332 136 
(51° 3 159 —457 237 75 —33 —62 327 = 155 
(s21* —105 -717 2140 —1177 —354 170 270 —1471 2185 - 
(52*1*) 262 740 — 2348 1425 384 —213 — 273 1553 — 2457 346 
(52° —70° —93 308 — 200 - 22 8 —232 414 — 204 
(531° 120 544 — 1616 868 287 —136 — 188 1037 15 
(5321 —196 = 2276 — 1446 —4!1 258 222 — 1290 = ~ 758 
(53" 22 105 —385 267 8 —65 —27 162 —265 
(541? —86 — 340 967 — 498 —159 124 - 104 —36 
342) 84 172 — 560 343 90 —43 —62 3 ag 
(5*1) 21 91 —270 161 42 —26 —32 182 —270 9 
(61°) —33 — 138 402 — 202 -7Jo 28 i+ —285 412 —140 
(621°) 136 — 1493 808 263 —118 — 183 1006 — 1528 556 
(62*1) —118 - — 642 — 162 102 III — 644 I — 444 
(631* -78 —347 1062 — 540 —210 84 115 — 646 980 —337 
{@32 174 —615 447 107 —85 —56 336 — 570 242 
O41 46 I — 557 270 109 —$7 —64 372 — 583 214 
( 5 —21 — 66 215 —141 —-37 26 22 —132 215 - 
(71 29 117 —342 189 I —242 — 46 243 —352 11 
(721" —94 — 303 947 = —177 66 112 — 625 972 —357 
(72*) 50 72 — 24 155 39 ca. | 2 176 -oe I 
(731) 42 190 — 616 306 139 —46 -5 335 =% I 
4) —25 - 222 — 108 —50 11 22 — 136 236 — 100 
(81°) —22 —96 279 — 148 -@ 22 3 —201 9 -97 
(821 46 157 —515 313 9 —52 —57 324 —515 192 
cH -17 —69 250 — 162 - 3 35 19 —114 190 —68 
(91 ; 20 76 —215 108 35 —14 -30 161 —233 81 
(92. —29 $51 173 —120 —25 17 20 —120 —98 
(10, 1) —1I1 —58 170 —81 —32 Ir 21 —116 170 —54 
(11) Ir 33 —110 22 —11 -11 —11r0 4 
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Table 4.11 (continued) 














i =I1(iii)} Aghyh,* 



































Ighighsh;* Iighghy —Iyghth, hth, hehsh, hh, hshy® —ighghy* —agha*hy® ag hy? hshsh, 
(1") | 277200 138600 69300 46200 69300 34650 II§50 332640 166320 83160 41580 55440 
(21°) 161280 1900 41580 28140 42210 21420 7350 196560 99792 50652 25704 34272 
(2717) 93450 48300 24955 17150 25620 13230 4690 §=114660 59262 30618 15813 21042 
(2515) 53910 28425 14085 10455 15480 8160 3000 66150 3488 18393 9696 12834 
(2*1) 30960 16692 9006 6372 9306 5025 1923 37800 20 76 10992 5934 7776 
a3 17700 9780 5420 3880 5565 3 1235 21420 11820 6540 3630 4680 
31°) 71680 37100 19180 13300 20230 10430 3710 92400 47712 24612 12684 17024 
(321°) 41270 21815 IIS15 120 12230 6440 2380 52620 27798 14664 7725 10334 
(32*1*) 23642 12797 6919 4957 7344 3966 1§30 29730 16094 8703 4701 6234 
(32*1*) 13478 7489 4163 3023 4382 2437 985 16680 9264 5148 2862 3736 
(32*) 7648 4372 2510 1840 2599 1495 635 9300 5304 3036 1746 222 
(371° 18050 9810 5315 3860 5820 3140 1220 23060 12574 6830 3607 497 
(321°) 1025. 5733 3197 2359 3458 1926 788 12830 7192 4021 2242 2972 
(3271) 5798 334< 1927 1438 2042 1179 509 7100 401 2362 1363 1762 
(3°1*) 4386 2553 1479 1127 1602 930 410 5370 313 1827 1059 1392 
(3°2) 2460 1482 94 685 936 567 265 2940 1770 1068 645 816 
(41”) 26635 14105 7455 5320 8260 4340 1610 37590 19761 1037 5439 7343 
(421°) 15215 265 4480 3255 4970 2680 1040 20805 11256 607 3276 4397 
(421°) 642 4830 2696 1990 2964 1649 673 11460 6386 3555 1977 2616 
G2"1) ms 3 2815 1626 1214 1754 1012 436 6285 3609 2076 1197 1546 
(431°) 573 3697 2070 1555 2356 1310 540 8615 4871 2738 1531 2069 
1237 
e 2151 1248 951 1384 801 351 4700 2742 1595 925 1218 
(4321") —2551 5904 
8 755 579 808 489 228 2555 1538 928 562 712 
(432*) 
509 — 1320 426 
“ 456 630 385 185 1890 1159 709 432 557 
(43*1) 
576 —1464 294 
(421°) 950 549 243 2990 1782 1052 616 834 
—465 920 — 188 — 188 200 ‘ 
P 334 159 1610 992 609 373 484 
(421) 578 —1360 325 320 —240 383 
(43) 85 630 410 267 174 216 
43 — 166 456 —1Ir —161 60 —119 69 
(1°) 13326 7185 3864 2073 2819 
—239 472 —95 —91 84 —96 22 59 
(s21*) 4010 2232 1239 1665 
1090 — 2249 485 440 —391 470 —110 —256 1160 
(s2%1*) 1289 744 976 
— 1135 2532-624 = — 495 413° —552 132 259 —1240 1447 
(s2') 450 568 
157 —392 134 66 —65 109 —25 —34 175 —238 70 
(s31*) ra] 
—845 1708 —340 —360 316 —360 88 176 —815 839 — 106 651 
(5321) 1024 — 2363 550 529 —359 484 —130 —214 1060 — 1257 166 —826 
(53") —140 357 —72 —1rI I —60 23 27 —140 17 —16 121 
( ae 593 —1Isi 239 217 — 262 288 —61 —112 535 — $7 86 —452 
542 —300 724 —214 —137 131 —2I10 50 54 — 280 362 —84 214 
(5*x) — 160 307 —63 —53 72 — 68 11 32 — 160 181 —21 144 
(61°) 204 —417 85 6 —69 86 —22 —5§2 221 —21 2 —146 
(621°) —731 1576 —352 —330 254 — 336 88 175 “a 82 =< 533 
(62*1) 452 — 1060 296 206 — 163 235 —59 —107 51 —612 100 —342 
(631* 529 — 1140 231 271 “- 250 —73 —107 2 —510 66 — 393 
{ 32. —255 630 —192 -—129 7 —126 34 56 —285 354 —50 216 
re —321 684 —138 —1590 141 — 186 50 56 — 269 291 —46 230 
(65) 115s —252 63 8 —5§2 63 —I1 —22 IIs —141 21 —104 
(71%) —174 363 —73 —80 57 —-74 22 4 —— 192 —27 122 
(721*) 453 —1037 233 244 —150 222 —73 at 489 -53 88 —317 
7a =—333 jo2 — 104 —61 45 —79 25 27 —135 17 — 50 76 
(731 —281 675 —132 —201 91 —144 62 5 —253 27 —36 199 
(74) 118 —300 75 89 —50 9 —39 7 g _ 108 25 —72 
(81°) 146 — 293 59 59 —50 60 —15 —38 163 — 164 20 — 108 
(821 — 233 548 —126 —124 76 —114 34 57 — 267 305 —40 176 
(83 95 —252 57 81 =—=e 45 —a9 —19 95 —114 II —76 
(91 —122 237 51 —43 46 —5§2 Ir 30 —131 132 -20 92 
(92) 82 — 204 69 33 -31 51 —11 —20 100 —129 29 —62 
(10, 1) 95 —192 33 43 oy 43 <a —21 95 —96 II —714 
(11 = 132 -33 a. 22 —33 Ir 11 —55 66 ae 44 
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Table 4.11 (continued) 
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w=11 (iv) | hghghgh, Ashs* fighgh* Iighgh, = ag*h, —saghs®—Iaghtgh;® Iigha*h, Iighshy* highgh, ighgh, igh 
(1") 27720 9240 13860 6930 2772 55440 27720 13860 9240 4620 2310 462 
(21°) 17388 5964 8946 4530 1890 60935280 =: 17892 9072 6132 3108 1596 336 
(271) 10864 3850 5754 2968 1288 22050 11382 5873 4032 2079 1099 245 
(2318 6765 2487 3684 1941 876 10 7149 376 2629 1385 754 179 
(2‘1? 4200 1608 2340 1269 594 220 4440 2400 1700 920 S15 131 
fan 2600 1040 1485 830 401 4920 2730 1520 1090 610 350 96 
31°) 8764 3108 4718 2422 1078 19040 9772 5012 3444 1764 938 210 
(321°) 5437 2006 3023 1585 736 11530 6058 3179 2229 1167 643 154 
(32*1*) 3365 1297 192. 1036 500 90 3719 2005 1433 771 439 113 
(32°1*) 207 840 121 677 338 4070 2263 1259 914 509 298 83 
(32*) 1280 544 763 227 2380 1366 788 578 336 201 61 
(3*1°) 2686 1044 158. $33 424 5690 3074 1655 1198 641 374 98 
(3*21°) — 677 99: 553 286 3320 1854 1032 762 422 254 72 
(3421) 101 440 624 361 192 1920 III0 642 480 278 171 53 
(3*1*) 807 355 510 294 162 1530 894 519 398 229 146 46 
Gia) 95 232 315 192 108 7° §? 321 247 151 97 34 
41°) 3843 1414 2177 1134 546 9275 4816 seg 1743 903 497 11 
(421°) 2365 913 I 89 743 374 5410 2896 15 1107 590 339 8 
(42*1*) 1453 591 7 6 254 3125 1730 957 699 386 230 6 
42°1) QI 383 550 318 171 1790 1027 591 438 253 155 4 
431‘) 1151 475 726 399 218 2485 1386 769 575 316 196 57 
(sary 705 309 452 260 146 1410 818 473 360 207 132 42 
(432? 431 201 279 170 7 795 480 291 223 136 8 31 
(4371) 340 163 227 138 2 615 378 231 183 111 75 27 
(471°) 484 216 332 189 114 990 580 337 264 151 102 34 
(4°21) 295 141 203 123 75 55° 337 206 163 99 68 25 
tp 141 75 8 65 41 230 151 99 81 53 38 16 
st) 1508 582 482 254 4051 2140 1129 804 423 243 63 
(521° 922 377 575 316 17 2276 1250 685 502 274 165 47 
(52*1*) 563 245 360 207 11 1271 727 416 311 178 111 35 
(s2*) 343 159 22: 136 79 706 421 253 191 116 74 26 
(531°) 443 197 29 170 102 971 505 326 253 144 95 31 
(s3a1) 270 129 183 111 68 536 326 198 155 04 63 23 
1305 
P 69 90 59 38 221 144 94 76 50 35 15 
(s3) | _228 “ts 
(s4x") 135 81 54 351 219 135 111 67 49 19 
534 — 63 390 
(542) 53 35 191 125 82 67 44 32 14 
+ —326 37 — 183 160 
(s*r) 26 112 76 51 44 29 23 Ir 
. —191 26 —128 52 56 
(61) 1546 841 456 336 181 III 32 
174 —22 87 —44 —22 51 
(621°) 480 273 207 117 75 24 
—685 or -331 182 89 -170 "608 
62"1) 164 126 76 50 18 
(62% 518 -72 223 — 146 — 68 102 — 402 
(632") 102 61 43 16 
485 —72 260 —133 —as 102 —378 235 293 
(632) _40 28 12 
3 — 366 61 —135 92 52 —51 216 —192 —147 152 
(641) 22 10 
9 —270 37 —198 102 57 —51 200 —132 — 168 14 136 ‘ 
(a) 156 —26 a). ee mW . [=e 63 63 eS 41 
Or —150 19 —69 38 16 -4 146 —84 —84 39 39 —11 
(721 429 —57 185 —121 —43 I —381 241 230 =—II7 7 33 
(72* —116 11 —56 54 Ir —27 108 -—90 —56 o 3 -1II 
(731 —273 46 —114 72 27 -5 7. sg --—E 8 2 23 
(4 9. -11 61 —50 —33 I -72 47 61 —22 —50 11 
(81 13 —19 62 —31 —16 38 —125 77 7Jo —39 —32 Ir 
(821 —276 46 —95 62 27 -57 210 -—152 —1I9 86 54 —22 
(83 138 —-35 33 —22 —1Ir 19 —76 37 57 — 46 —22 Ir 
(91 — 104 II —62 31 16 —30 101 —61 —62 31 32 22 
92. 102 -11 42 —40 —1I1 20 —80 69 42 —40 —22 11 
(10, 1 86 -I1 53 —22 —16 21 —74 43 53 —22 -32 +e 4 
(11 — 66 11 —33 22 11 —11 44 —33 —-33 22 22 —11 
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Table 4.11 (continued) 














w=11 (v) 





hahy* hyhgh;* hyhy hyhgh, Iihy —Igh;® —Iighghy gh, Iigh* —Iigh, iy hy 
(1%) | 7920 3960 1980 1320 330 999 49 165 110 55 II I 
(21) | 5472 2772 «1404 «394 747 3 129 91 46 10 I 
fart 3702 1908 = 983 674 183 552-284 10° 74 38 2 1 
2°15) | 245 1293 681 475 136 3 210 77 59 31 I 
(2*1*) | 159) 64 468 332 101 282 153 59 25 7 1 
(2°) 1020 570 320 230 75 195 110 45 35 20 6 I 
(31°) | 3392 1732 84 162 529 269 93 7 37 2 I 
(321°) 2218 ~=61159 605 422 120 379 197 71 5 30 I 
(32*1*) sass 765 4u1 293 89 =. 265 142 54 45 24 7 I 
(32*1*) 96 499 278 202 66 181 101 41 34 19 6 1 
(32) 556 322 188 138 4 121 71 31 25 15 5 I 
(3*1) | 125 670 +9357 = 256 7 ~ 4 131 49 44 2 7 I 
(3 21°) 77 432-239 176 58 167 92 3 33 I 6 I 
3°2"1) 476 276 I 120 4 110 64 2 24 14 5 I 
(3°1*) 402 234 135 104 3 99 57 25 23 13 5 I 
bs) 147 90 7Jo 28 6 32 19 16 10 I 
41") | 1901 1009 519 358 99 «= 35 183 St 57 29 3 I 
(421°) 1231 653 346 5 73 247 130 44 2 7 I 
(42*1°) 759 41 230 107 54 I 91 36 33 I 6 I 
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Some procedures for comparing Poisson processes or populations 
By ALLAN BrrnsBaum,* Columbia University 


1. INTRODUCTION 


The problem of comparing two Poisson processes arises in @ variety of applications. For example, if 
a manufactured substance like cloth, paper, or wire is inspected continuously for flaws, the number 
x of faults observed in any length ¢ may often be assumed to have the probability distribution 


p(x, A,t) = e-At(At)?/x! («= 0,1,2,...), 


where A is the mean number of faults per unit length. Hence the problem of comparing two kinds of 
material as to flaws is statistically a problem of comparing the parameters A,, A, of two Poisson processes. 

Alternatively, if a Geiger counter is used to record emissions from a radioactive substance under 
essentially constant conditions, the number x of emissions recorded during any duration of time t may 
be assumed to have the probability distribution p(z,A,t), where A is the mean number of emissions 
recorded per unit interval of time. The problem of comparing the emission rates of two such substances 
observed under comparable conditions is statistically again the problem of comparing the parameters 
A,, A, of two Poisson processes. 


2. SoME EXPERIMENTAL DESIGNS 


A simple experimental procedure which will be convenient for many such problems is the simultaneous 
observation of the two processes. Assume that two such processes characterized by distributions 
p(x1,A4,t), P(g, Aq, t) arc observed simultaneously until the first ‘event’ occurs, i.e. until either 7, = 1 
or x, = 1; in terms of the above examples, an ‘event’ consists of the observation of a flaw or the recording 
of a count. (The simultaneous occurrence of two or more ‘events’, either in the same or in different 
processes, has probability zero; hence this possibility may be ignored.) Then the probability that the 
first event occurs in the first process, on the condition that between ¢ and t+ At units of inspection are 
performed before it occurs, is A(ArtAst)s 


where € approaches 0 with At; this is seen to be independent of t. Hence if p is the probability that the 
first event is observed in the first process, p = A,/(Ay+A,) = [(1+(A,/A,)]“, which depends on A,, A, 
only through their ratio y = A,/A,. If such simultaneous inspection is continued, and if we set y; = 1 
or 0 according as the ith event is observed in the first or second process, the experiment is seen to provide 
a sequence of independent Bernoulli observations y; such that 


Pr{y, = l}=p, 
Pr{y,=O}=1l—p (¢=1,2,...), 
where p = 1/(1+). 

For many (but not all) purposes it is appropriate to express comparisons of the two processes in 
terms of the ratio y = A,/A,. Statistical questions concerning y may then be answered by use of the 
various methods available for dealing with Bernoulli observations. 

Problem 1. Suppose that it is required to test whether A, = A, against the alternatives A, >A,, with 
significance level « = 0-05, and with power 1 —# = 0-95 when A, = 4A,. The requirement may be met 
by any test, based on the observations y,, of the hypothesis that p = } against the alternatives p> 4, 
with significance level « = 0-05, and power 1 — 8 = 0:95 when p = %. 

Method 1. Using tables of the binomial distribution (or tables of the normal distribution for approxi- 
mation) we find the smallest value of n such that, for some corresponding critical value r,, we have 


n 
Pr ZX y¥i>T,|p = i} <00s 
i=1 


n 
and Pr{ $ y.>ralp = 4} > 0-06. 
i=1 


* This work was begun while the writer was employed by the National Foundation for Infantile 
Paralysis and was cornpleted with the support of the Office of Naval Research. The writer is indebted 
to the referee for helpful comments on an earlier draft of this paper and in particular for suggesting 
the method used in §3. 
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(By use of the randomization procedure described, for example, in Tocher (1950), these inequalities 
can be replaced by equalities, and these requirements may be met by a smaller value of n than before.) 
Using the normal approximation, we find these requirements satisfied approximately by n = 130 
and r,, = 77. 

The appropriate procedure is, therefore, to observe the two processes simultaneously until a total 
of 130 events has occurred, and then to reject the hypothesis that A, = A, if and only if 78 or more of 
the events occurred in the first process. An appreciable shortening of the average duration of the 
experiment can be achieved by the following modification: 


Method 2. Without altering the size or power of the test we modify the preceding method to obtain 
the following curtailed-sampling procedure: Observe the processes until either (a) the number of events 
observed in the first process equals 78, in which case reject the hypothesis, or (b) the number of events 
observed in the second process equals 53, in which case accept the hypothesis. 


Method 3. The average duration of the experiment can be minimized, whether A, = A, or A, = 4A,, 
by use of Wald’s (1947) sequential probability ratio test. Let 











1-£ 0-95 _ pine Seer oN 
a= log -- = log 55 = 10819 b=bg7_, = 555 a, 
a a b 
h=-—_—_—_——__ = —, A= =—h,, 
" fegtt=a) ess! *  petli=m)  * 
Pol —P1) Po(1l—7,) 
1—p 

and 8 =lo == log. 
= et 


For every m = 1,2,...,let z,, be the number of events observed in the first process, out of the first 


m 
m events observed; i.e. zm = L y,; Continue the experiment until either 
t=1 


(@) Z2m2>h,+ms, in which case reject the hypothesis A, = A,, or 
(6) 2m<ho+ms, in which case accept the hypothesis. 


Problem 2. Suppose that it is desired to give an estimate of y = A,/A, by means of a confidence interval. 
We may obtain n observations y,, ..., ¥, and use the procedure of Clopper & Pearson (1934) to determine 
a confidence interval (pz, py) for p such that Pr{p,<p<py}>1—/. Then since y = 1/p—1 we have 
Pr{1/py—1<A,/A, < 1/pz —1}>1—£, so that (1/py—1, 1/pz—1) is the desired confidence interval. 

A useful property of all procedures based on the y,’s is that even if A, and A, vary during the course 
of an experiment, so long as A,/A, remains constant the procedures remain valid. 

The generalization of these procedures to problems of comparing more than two processes is direct; 
binomial probability distributions of the y,’s will then be replaced by multinomial distributions. 


3. DURATION OF THE EXPERIMENT 


In the procedure described above to obtain the sequence of observations y,, let t, denote the amount 
of inspection performed to obtain the first observation, and generally let t; denote the amount of 
inspection performed after y,_, is obtained to obtain y,, fori = 2,3,.... If we denote by m the number of 
observations y, obtained during any amount ¢ of inspection, then the probability distribution of m is 
given by (ut)™ 
= e~ st = eeéty 
P(m, pt) = eM“ (m= 0, 1,2,..-) 
where 4 = A, +A,. The ¢,’s are observed ‘waiting-times’ between events in this Poisson process. As is 
shown, for example, in Kendall (1950), the ¢,’s are distributed according to the density function 


f(t,4) = pew (#20). 
Thus 2yt is distributed as x3 (yx? with 2D.¥.); letting 


n 
T,= 2% 
t=1 


the amount of inspection required for a procedure based on n observations y,, we have that 27’, is 
distributed as x3,,. If, for example, n = 10, we find (on referring to tables of the distribution of x49) that 
Pr {T, < 31-4/(2u)} = 0-95. If wz = 4, then with probability 0-95, less than 31-4 units of inspection will 
suffice. Such calculations are of use in procedures like those proposed above for Problem 2, and in 
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Method 1 for Problem 1. For Method 2 under Problem 1, the same calculations may be used to provide 
a lower bound for the probability that a specified amount of inspection will suffice; thus in the preceding 
example, the probability that less than 31-4 units of inspection will suffice is appreciably more than 0-95 
if Method 2 is used. 

If Method 3 is used to minimize the average amount of inspection required for Problem 1, the formulae 
given by Wald (1947) may be used to obtain E(n), the average number of observations y,; required. Since 
1/u is the mean amount of inspection for one observation y;, the average amount of inspection will be 
E(n)/q. 

In all of these procedures it is clear that the average amount of inspection required varies inversely 
with w = A, +A,, the ‘nuisance-parameter’ in our problem. If no lower bound on p is known at the outset 
of inspection, no probability statements concerning the required amount of inspection can be made. At 
any time after inspection has begun, m/t constitutes an estimate of 4 and so may be used to obtain rough 
estimates of the additional amount of inspection required. 


4. COMPARING POISSON POPULATIONS 


The problem of comparing two Poisson populations occurs in a number of applications in which it is 
somewhat unnaturai to conceive of the observations as if obtained by inspection of a continuing process. 
For example, the numbers z,, x, of cases of a rare disease observed in two large groups of individuals 
during some period may be assumed to be independently distributed according to the Poisson laws 


g(x, A;) = e~Ai(A,)*i/z,! (x, = 0,1, 2,..., for i = 1 and 2). 


Problems of comparing incidence rates in two such groups may for some purposes be formulated as 
statistical questions concerning the ratio y = A,/A,. As 


g(x;,A,) = p(z,A;,1) (¢= 1,2), 


such statistical problems are formally equivalent to problems involving Poisson processes where certain 
restrictions are imposed on the units and the amount of inspection which will be performed. 

Each of the statistical methods proposed above for Poisson processes can be directly adapted to the 
present case. To apply Method 1 for Problem 1, for example, we should continue taking paired observa- 
tions (7,,x,) until the total Xx, + Xz, of all observations reaches or exceeds the predetermined n = 130. 
Then the appropriate test based on binomial probabilities may be used. 

Other applications have been discussed recently by Maguire, Pearson & Wynn (1952). 
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Scale factors and degrees of freedom for small sample sizes for y-approximation 
to the range 


By GEORGE WM. THOMSON, Ethyl Corporation, Detroit 20, Michigan 


Patnaik (1950) has approximated the distribution function of the range in random samples from a normal 
population by the use of the y-distribution. If@,, ,, is the average of the ranges of m independent samples 
she Wn, n/F = ex), (1) 


where ¢ is a scale factor and v is an equivalent number of degrees of freedom for y. It follows that @,,,/¢ 
may be considered to be equivalent to the usual standard deviation estimator s with v degrees of freedom. 
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The series approximation which was used to obtain Patnaik’s table of c and v for various sample sizes 
leads to considerable error for small values of v*. A corrected table of c and v was given by H. A. David 
(1951, Table I) but only to the nearest 0-01 in c and 0-1 in v. This note presents in Table 1 new values of 
ce and » for the single sample case (m = 1) based on recently published moment constants for the dis- 
tribution of the range in normal samples (Hartley & Pearson, 1951) and an interpolation of the ’-function 
at close intervals. 

It was noticed that if these corrected values of c and v are employed, the approximation to the dis- 
tribution of range by cy/,/v is somewhat better than was indicated by Patnaik (1950). Thus results of 
Lord (1947), obtained exactly by quadrature, provide factors by which the range (w) in a single sample 
of n should be multiplied to obtain confidence limits for the mean (jz) of the normal »>pulation from 
which the sample has been drawn. For example, 95 % confidence limits can be obtainsd from 


Ztfoosv, 
where Z is the sample mean and f).o, is given in the last column of Table 1 below. If too; is the 5 % limit 
of Student’s ratio (two-sided test) for v degrees of freedom, then fy.5, may be approximated by fo.95/(c ./7). 
Comparison of the last two columns of the table shows that the agreement is excellent. The differences 
are negligible for all practical purposes. The agreement is not unexpected in view of Pearson’s (1952) 


findings that the Patnaik y method provides an excellent approximation to the probability integral of 
the range for n = 4, 6, 10, 15. 


Table 1. Scale factors and equivalent degrees of freedom for x-approximation 











to range in normal samples 
se 
No. in Degrees of Scale Equivalent See 
sample freedom factor two-sided 
n v c 5%t 
70 +t/(c.Jn) Lordt 
2 1-0000 1-41421, 12-7062 63531 63531 
3 1-9845, 1-91154, 4-3349 1-3093 1-3039 
4 2-9291, 2-23886, 3-2265 0-7206 0-7166 
5 3-8266,,. 2-48124, 2-8267 0-5095 0-5066 
6 46772, 2-67252, 2-6249 0-4010 0-399 
7 54841, 2-82980, 2-5038 0-3344 0-333 
8 6-2511, 2-96288, 2-4233 0-2892 0-288 
9 69818, 307793, 2-3658 0-2562 0-255 
10 7:6798,_ 317905, 2-3228 0-2311 0-2301 


























* Multipliers for sample range to get 95% confidence limits of the mean. 
t+ From direct quadrature (Lord, 1947). 
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The third moment of Gini’s mean difference 
By A. R. KAMAT, University Coliege, London 


U. S. Nair (1936) has given an expression for the variance of Gini’s mean difference, defined by 

. > s (1) 
9 nin—1) i=1 Pe ar sali 
Nair’s formula applies to any parental distribution of the z,, although its actual evaluation may be 
tedious. Recently Lomnicki (1952) has obtained a simpler expression for the same by a more direct 
approach. In this note we derive the third moment of g when the parental distribution is normal with 
the help of the absolute moments of normal distribution obtained by Nabeya (1951, 1952) and Kamat 
(1953). It may also be observed that this method of derivation is considerably simpler than those adopted 
by Nair or Lomnicki for deriving the variance of g. 

Letz;(i = 1, 2, ...,n) denoteasample from anormal population, N(u, 7). Thenif wedenotez,, = 7,—2;, 
24; is N(0,,/2c) and p(z,;,z;,) = 4. For the evaluation of the first three moments of g, we require the 
following expectations which are readily found by using the formulae for absolute moments (see, for 
example, Kamat, 1953, pp. 26-7): 


2 - 2 y 
E(|24|) = —, &(e3) = 20%, &(|zy| | ze) = —(2/3+4m) 
Ju T 


80% 5a° 
E(| 2,5 |*) = Va ’ E(| 25 |* | Zax |) = Vn’ 


(|| leal leal) = (354483), 
30% 
(|u| 2a lel) = 
(|u| lel leu) = Se (<5 - ¢sin- t+sin1=), 


3 
(| 245 | | Zsx| | Ap) = <= ($ +7): 


To find the moments of g the following expectations have to be evaluated: 








\ 
&(g) = wo nna) o = |u|)» 
4 
é(9*) = Wn aye e eats + UE | 25 | Zu | +2 | 245 | | zur), sae 
&(g*) = mn 1p OE | Aes | + BEeb | een | + Zaks | Zeal) + CE | 215 | | Zen] | 2a] +2 | 2s | | Zee] | Aye | 
+E | 245 | | Ze] | 2x2] +2 | 205 | | Zee] | ero | + Z| ees | | Zee! | Z50 |), 


where i, j, k, l, p, g are all different and the summation symbols denote summation over all of them with 
the obvious limitations imposed by the definition (1). 
It can be shown that the number of terms in the sums of the various types occurring in (3) are as 





— Type of sum Number of terms 
| 245 | | Zee | $n(n— 1) (n—2), 
| 24s | | 2x2] $n(n — 1) (n— 2) (n—3), 
45 | Z| n(n—1)(n—2), 
249 | Z| $n(n — 1) (n—2)(n—3), 
| 245 | | Zee | | 200] $n(n— 1)(n—2)(n—3), P (4) 


| 249 | | Zen] | Ze | 
| 24s | | sm | | Zee] 
| 2451 | Zee] | 2101 
| 24s | | 22! | Ze | 





n(n —- 1) (n—2), 
$n(n— 1) (n—2)(n—3), 

$n(n— 1) (n—2)(n—3)(n—4), 

dgn(n — 1) (n—2) (n—3) (n—4) (n—5). 
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Substituting in (3) the values of expectations given in (2) and noting (4), after some simplification, 
we have the following first three moments of g: 


20 
i eye 5 
hy = Ja = (1-128379) c, (5) 
= 47 _ (23-44 4m) n+ (6—4,/3 + 4m)} 
&= n(n—1)7 ry v 
Pa th {(0-651006) n + 0-151508} (6) 
n(n— 1) 
~ rot for large n. 


8a° 
Ms = Faq 1yegl {16/2 + 40— 36./3)n* + (144,/3 — 136 — 80.2) n + (96/2 + 120— 144,/3—2m)} 


= iim a pt (0398063) n* + (0-399735) n + 0-094822} (7) 


(0-393063) o® 


72 for large n. 


For n = 3 we know that g = $w, where w is the range. It is easily verified that 4,(g) given above checks 
up with the third moment of w given, for instance, by Hartley & Pearson (1951). 

Although it is not possible to say much about the distribution of g on the strength of its first three 
moments only, the following values of ,/(,)/#, and /, suggest that it may be close to the y-distribution. 
This is confirmed by the y-approximation to the distribution of the range w for n = 3, since in this case 


g = $w. The fourth column of the table gives for comparison the values of £, for the x-distribution 
having the same ,/(/,)/3. 


























D for 
7 VHa)/ 4 A, edilatiostiom 
5 0-3657 0-1797 0-1672 
10 0-2411 0-0708 0-0648 
15 0-1926 0-0436 0-0398 
20 0-1650 0-0315 0-0286 
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A method of systematic sampling based on order properties* 
By R. M. SUNDRUM, Ph.D. Division of Research Techniques, London School of Economics 


1. IvTRoDUCcTION 


In some theoretical work, particularly in non-parametric inference, the exact probability distributions 
of certain statistics are so complicated, at least for small samples, that we have to resort to sampling 
experiments in the laboratory to derive some constants of these distributions. In this paper the method 
of systematic sampling, which is often employed in surveys and censuses, is utilized to improve upon 
the method of random samples. In such systematic methods a ‘grid’ or ‘cluster’ of points covering the 
entire population is needed, the randomness entering into the choice of this grid or cluster. It is found 
that the order properties of samples may be used to determine such a cluster of points. 


* Part of a thesis submitted for the degree of Ph.D. in the University of London. 
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One disadvantage lies in the greater difficulty of determining the sampling errors involved. When an 
idea of the sampling error is not available, the method can be safely used only when there is some external 
check on the resulting estimates. It has not been possible to evaluate the sampling errors of the method 
proposed in this paper, but some check is attempted by using some known theoretical results. 


2. SYSTEMATIC SAMPLING FROM A BIVARIATE POPULATION 


We consider first the problem of finding the probabilities of the 120 possible rankings of five from a 
bivariate normal population. In another paper (Sundrum, 1953, submitted to Biometrika) these pro- 
babilities are used to determine the third and fourth moments of Kendall’s rank correlation coefficient 
in the non-null case. As it appeared extremely difficult to evaluate these probabilities theoretically, it 
was decided to use estimates based on sampling experiments. 

The usual method based on random sampies of constructing samples of five from a correlated normal 
population is to take independent sets, say of five X’s and five Y’s, from Wold’s Tables of Random 
Normal Deviates, and then for each set to find the quantity Z = pX +./(1—p*) Y; then the resulting sets 
of X and Z may be considered a sample from a bivariate normal population with correlation p. The 
extent to which the sets of X and Z represent samples from a correlated normal population depends on 
the extent to which the original sets of X and Y represent samples from an independent normal 
population. 

Now, if samples of five bivariate observations of X’s and Y’s are drawn from an independent normal 
population, and the X’s are arranged in order of magnitude, then it is well known that the 120 possible 
rankings of the Y’s are equally probable. In order to get better estimates for the correlated population, 
we therefore adopt a method which ensures that the samples of correlated observations are constructed 
from samples of independent observations, when these independent observations are made to conform 
exactly with the condition of equi-probability of the 120 rankings. (In the following a set of X’s and Y’s 
will be referred to only by the ranking of the Y’s when the X’s are arranged in order of magnitude.) 
The method used is as follows: five X’s and five Z’s are drawn from Wold’s Tables. The five Z’s are then 
permuted in the 5! = 120 possible ways. Each of these sets associated with the set of five X’s represents 
a sample from an independent normal population. If, then, we calculate for each of the 120 bivariate 
observations the quantities Y = X+Z’ (where Z’ represents one of the 120 permutations), we get 
120 sets of five Y’s each, which when associated with the X’s may be considered as samples from 
a bivariate normal population with correlation p = 1/,/2. 

To illustrate the method, consider an example with three bivariate observations. From the first six 
numbers of Wold’s Tables, write x -0-84 1:37 —0-18 


Z 0-35 2-82 2-12 
The 3! = 6 samples from an independent population are then 
x Z, 2, Z; 4%, Zs Ze 
— 0-84 0°35 0-35 2-12 2-12 2-82 2-82 
—0-18 2-12 2-82 0-35 2-82 0-35 2-12 
1:37 2-82 2-12 2-82 0-35 2-12 0-35 
Then the samples from a correlated normal population are obtained by calculating the values of 
Y,= X+Z,, thus: 
x Y, Y, Y; Y, Y, Y, 
—0-84 —0-49 0-49 1-28 1-28 1-98 1-98 
—0-18 1-94 2-64 0-17 2-64 0-17 1-94 
1-37 4-19 3-49 4-19 1-72 3-49 1-72 
In the same manner, we can also obtain six samples by taking the permutations of the X’s. Thus, with 
a sample of five bivariate observations, we can generate 240 sets of five bivariate observations each, 
representing samples from a correlated normal population. 


In practice, a systematic procedure may be adopted. For the above example, we construct a 3 x 3 
table as follows: 








x | -o8 —0-18 1-37 

Z 
0-35 ~0-49 0-17 1-72 
2-12 1-28 1-94 3-49 
2-82 1-98 2-64 4:19 
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The body of the table is filled in by adding the values of X and Z corresponding to each cell; e.g. in the 
first cell on the top left corner we put in —0-84+0-35 = —0-49. We then get a set of three values, 
correlated with X, by taking elements from the first, second and third columns, such that no two are in 
the same row. Similarly, a set of three values, correlated with Y, is obtained by taking elements from 
the first, second and third rows such that no two are in the same column. This can be done fairly quickly 
with some practice and may even be mechanized. 

This method was used to obtain samples from a bivariate normal population with correlation p = 1/,/2. 
Fifty bivariate observations of five X’s and five Z’s were taken from p. 33 of Wold’s Tables; each of these 
sets was used to generate 240 observations from the correlated population, giving in all 12,000 bivariate 
observations. When these 12,000 observations are classified, according to the ranking of the Y’s, we obtain 
the following distribution: 


12345 1,044 . 21345 631 31245 299 41235 123 51234 31 
12354 554 21354 274 31254 124 41253 42 51243 25 
12435 719 21435 404 31425 161 41325 80 51324 24 
12453 282 21453 154 31452 57 41352 26 51342 15 
12534 339 21534 132 31524 46 41523 32 51423 16 
12543 242 21543 121 31542 32 41532 20 51432 6 
13245 631 23145 301 32145 233 42135 92 52134 26 
13254 381 23154 133 32154 124 42153 28 52143 16 
13425 345 23415 131 32415 110 42315 67 52314 17 
13452 129 23451 26 32451 29 42351 10 52341 4 
13524 183 23514 53 32514 62 42513 9 52413 11 
13542 95 23541 21 32541 24 42531 ill 52431 5 
14235 416 24135 158 34125 41 43125 63 53124 16 
14253 146 24153 38 34152 14 43152 18 53142 5 
14325 224 24315 113 34215 52 43215 65 53214 20 
14352 95 24351 8927 34251 13 43251 15 53241 7 
14523 85 24513 16 34512 9 43512 8 53412 7 
14532 54 24531 8921 34521 6 43521 10 53421 4 
15234 179 25134 42 35124 21 45123 19 54123 14 
15243 107 25143 8927 35142 13 45132 9 54132 5 
15324 120 25314 29 35214 29 45213 5 54213 6 
15342 58 25341 12 35241 12 45231 5 54231 3 
15423 62 25413 9 35412 9 45312 6 54312 4 
15432 39 25431 14 35421 a 45321 5 54321 2 


Total 12,000 


From each set of five bivariate observations, we can draw five sets of four bivariate observations, 
which may also be considered samples from the same correlated population. For example, from the 
sample represented by (15423), by omitting one observation at a time and re-ranking the remaining 
observations, we get the samples represented by (4312), (1423), (1423), (1432) and (1432). Thus, we can 
derive from the above distribution, the distribution of 60,000 observations with four bivariate points 
each. The distribution of these is as follows: 


1234 12,891 2134 6740 3124 2733 4123 847 

1243 = 6,588 2143 3153 3142 944 4132 511 

1324 387,571 2314 2788 3214 2129 4213 586 

1342 2,825 2341 748 3241 613 4231 292 

1423 =3,207 2413 1017 3412 395 4312 322 

1432 =1,979 2431 6570 3421 313 4321 238 
Total 60,000 


We can continue the same method to obtain the distribution of rankings of three observations from 
the above distribution of rankings of four. We then get: 


123 106,510 


132 48,096 
213 47,946 
231 13,892 
312 + =14,362 
321 9,194 


Total 240,000 
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At this stage we can check the ‘quality’ of this sample by the correspondence of the above distribution 
with some theoretical values. I have shown elsewhere that the probabilities of the six rankings of three 
from a normally correlated population are given by 


Pr {123} = s{p+ssim —* sin tp +5 (sin p)*—— (sin ivr 
Pr {132} = Pr{213} = ies +i . 4p— (sin) +5 (sin wr 
Pr {231} = Pr {312} = ; (; + “sintp - > sin 4p—- ss (sin-1 p)? + - (sin-1 wr'| 
Pr {321} = ; (; - * sin + * sin tp+ * (sin-1 p)? — a (sin-} ip) ° 


Evaluating these expressions for the case of p = 1/,/2 we can compare with the relative frequencies 
of these rankings actually achieved in the sample. The results are as follows: 








Theoretical Sample 
123 04431 0-4438 
132 0-2009 0-2004 
213 0-2009 0-1998 
231 0-0585 0-0579 
312 0-0585 0-0598 
321 0-0381 0-0383 

















The correspondence appears to be satisfactory. 
Incidentally, from the distribution of rankings of five, we may derive the small sample distribution of 
Kendall’s rank correlation coefficient from this population. The distribution is as follows: 


Distribution of t(n = 5) from a normally correlated population p = 1/,/2 





t Frequency 





1044 
2535 
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Total 12,000 














The mean and variance of this distribution are 0-4986 (0°50) and 0-10818 (010992) respectively, the 
figures in brackets being the known theoretical values. 
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In the same manner, we may derive the distribution of the Spearman rank correlation coefficient in 
samples of five from such a population. The distribution is as follows: 


Distribution of r,(n = 5) from a normally correlated population with p = 1/,/2 























%, Frequency % Frequency 
1-0 1044 —0-1 235 
0-9 2535 —0-2 116 
0-8 1059 —0°3 142 
0-7 1982 —0-4 55 
0-6 1242 —0°5 67 
0-5 893 —0°6 55 
0-4 562 —0-7 37 
0-3 896 —0°8 18 
0-2 320 —0-9 16 
0-1 505 -1-0 2 
9-0 219 

Total 12,000 











The mean and variance of r, from this distribution are 0-5938 and 0-12365 respectively; the theoretical 
value of the mean, from Moran’s formula, is 0-595. 


A final assessment of the value of this method must of course await further investigation on the 
standard errors involved, and the speed with which each method of sampling can be carried out. Clearly 
the 12,000 values of, say, 7, obtained by 240 recombinations of each of the fifty independent bivariate 
samples are ‘worth’ more than fifty independent values of r,; but they would probably provide less 
information than we should obtain by calculating one value of r, from each of 12,000 independent 
bivariate samples. Thus if the same accuracy is obtained by drawing N independent samples and 
calculating one value of the statistic from each, as by drawing n samples each providing m values of the 
statistic by recombination, the practical problem is to determine which procedure iz\volves the smaller 
expenditure of effort. 

3. APPLICATION TO THE UNIVARIATE CASE 


The above method of systematic sampling may also be applied to some univariate problems. For 
example, the sampling distribution of many distribution-free statistics for testing the homogeneity of 
two samples cannot be determined simply under certain non-null cases. Then we may use a systematic 
procedure of sampling experiments to obtain such distributions. Let the problem be one of determining 
the sampling distribution of some statistic based on ranks alone to test the homogeneity of two samples 
of mX’s and nY’s, when the X’s and Y’s are drawn from populations which differ only in location. Then 
a laboratory sample is obtained by taking mX’s and nZ’s from identical populations of the specified 
type, and constructing a sample of nY’s by adding, say d, to each of the nZ’s. The extent to which the 
samples of X’s and Y’s represent the populations with different location depends on the extent to which 
the X’sand Z’s represent the identical populations. In order to ensure that the X’s and Z’s do represent 
samples from identical populations, they should satisfy the requirement that all the oe ") ways of 
partitioning a set of (m +n) observations into two sets of m and n respectively should be equally probable. 
In order to achieve this exactly, the following suggestion is made. We take a set of (m+n) observations 


m+ 


and divide them into two sets of m and n in all the ( ‘i ") possible ways. From each of these sets the 


n observations are transformed into the second sample of n Y’s by adding the quantity d to each of thom. 
Such samples may then be used to study the distribution of such statistics as Wilcoxon’s two-sample 
test under alternative hypotheses. 

4. CONCLUSION 


In using this method, the practical task is very much simplified because of the systematic nature of the 
sampling. However, it does not seem practicable to use this method for any large sample, for the task of 
enumerating all the possible permutations would be prohibitive. The main use of this method seems 
to be to estimate constants of certain distributions, rather than the distributions themselves. 
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A note on ordered least-squares estimation 
By F. DOWNTON, University of Liverpool 


The parameters yp, o of a distribution having the form f{(x—)/o} may be estimated by applying the 
method of least squares to ordered observations, where ordering is according to magnitude. The general 
theory was given in a paper by Lloyd (1952). It was there shown that the ‘ordered’ estimate fi of the 
expectation w had a variance never exceeding that of the sample mean; and, in the case of symmetric 
distributions, necessary and sufficient conditions were obtained for the variance of ji to be strictly less 
than that of the sample mean. 

In this note these conditions have been extended to include the unsymmetrical case. 

Let x,,2%9,...,%_, be a sample of n independent observations on a continuous variate X whose dis- 
tribution has the above form. The possibility of repetitions among the z’s then has zero probability. We. 
may therefore, with probability one, write 


Ley) < Lg) < Lg) < --- < Xin) (1) 
for the ordered observations 2). 


iat Zp = (Xp—f)/F, Aq = (Xp —H)/0, 
be the reduced observations, unordered and ordered respectively. They have parameter-free distributions, 
so that their moments may be regarded as known. 
Clearly, 
F(z.) =0 (r=1,2,3,...,n). (2) 


Let EO (Zq)) = hey COV (2p) %q)) = Wye (7,8 = 1,2,3,...,0). 


Let @ denote the (nx 1) vector of the a,, w the symmetric, positive-definite (n x n) matrix of the 
W,,, 1 an (nm x 1) vector of 1’s, and x the (n x 1) vector of tho x4). 

We note for future use that 1’a = 0, that a + 0, and that 1’wl = n. The first of these results follows 
from the fact that the z, and the %) are simply permutations of the same set of numbers; hence 
=z, = Xz) and 

Va = La, = LE (x) = E (Zz) = &(Xz,) = L&(z,) = 0. 


A similar argument shows that 1’wl = n. 

The second result follows from (1). For with r<s, we have z,) <2, and, taking expectations, «,<a,. 
Therefore a, +a, and hence a +0. 

The ‘ordered’ estimate A, of 4, and its variance, are given by 

fh = a’w-(al’ — 1a’) w-x/A, 
var (i) = a’w-lac?/A, 
where A = (1’w-"4) (a’w-a) — (1’w-te)?. 

Lloyd showed that when the distribution is symmetric, 1’w-1a = 0, which considerably simplifies 
these expressions. He also showed that a necessary and sufficient condition for the variance of fi to be 
strictly smaller than o?/n is that w1 +1, or, equivalently, that var (2) = o*/n if and only if wl = 1. He 
showed further that when the variance of f equals that of the sample mean, ji itself coincides with the 
sample mean. 

In the general case, where no symmetry assumptions are made we first note that both w and w-! are 
positive-definite and symmetric and may therefore be expressed in the form 

w=tt’ and w-'= (t-)’t”, 


where t is a lower triangular matrix. 
Consider any two (n x 1) vectors b and c. 


= b’wb = b’tt’b = h’h = Zh?, say, 
where h = t’b. 
Also c’w-c = c’(t-1)’t1c = k’k = ZK, say, 


where k = tc. 
Now by the Cauchy-Schwarz inequality we have 


(Zhj) (Ukj) > (ZA, k,)*, (3) 
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and the necessary and sufficient condition for equality is that 


h;= Ak; for some constant A and for all +. (4) 

In matrix form {3) and (4) become 
(b’wb) (c’w-'c) >(b’c)?, (5) 
with b=Aw-'c as the condition for equality. (6) 


Put b = w-!1—1 and c = a. Then (5) becomes 


(1’w- — 211 + 1’w1) (awa) > (1’w-1a — 1’a)?. 


Now Vi=UVwil=n and l’a=0. 
Thus (1’w-1l — n) (a’w-1a) > (1’w-1a)?, 
or {(1’w-11) (a’w-ta) — (1’w-1a)%}/(a’wa) = o2/var (fi) >n, 


since a’w-a is essentially positive. Thus var (#) <0*/n; and the necessary and sufficient condition for 
equality is, from (6), that 
wl = 1-da (7) 


for some scalar constant A. When this condition is satisfied not only is the variance of ji equal to that 
of the sample mean, but ff itself necessarily is the sample mean. 
For, pre-multiplying (7) by «’w-1, 1’w-! and w-! we have 


a’w-1 —Aa’w-la = al = 0, (8) 
Iw-1—-Al’wa = 1 =7n, (9) 
and w-1—Aw-a = 1, (10) 
respectively. 
Now A = (a’w-a) (1’w-11) — (1’w-ta)? 
= a’w-a(l’w-1—Al’w-a) from (8) 
= na’'wa from (9). (11) 
Also B = (@’w al’ —a’w-Ma’) w-"x/A 


= a’w-la(l’w-!—Aa’w-!)x/A from (8) 
= a’w-al’x/A from (10), 


whence fi = 1’x/n, which is the sample mean. 
I am grateful to Dr E. H. Lloyd for an extremely stimulating correspondence on this subject. 


REFERENCE 


Luoyp, E. H. (1952). Least squares estimation of location and scale parameters using order statistics. 
Biometrika, 39, 88. 


A note on the evaluation of the multivariate normal integral 
By F. N. DAVID, University College, London 


1. A problem which arises frequently is concerned with the evaluation of the multivariate normal 
integral. Thus if x1, 73, ...,Z%, are normal variables, all correlated, we often need 


P{a,>0,2,>0, ...,%,_>0}. 


Little work of practical use has appeared since the papers of Moran (1948) and Kendall (1941). It is 
the purpose of this note to challenge others to the solution of the problem and to summarize the position, 
as the writer sees it, at this present time. 
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2. Let the event H; consist in the random variable x, being positive. Thus 
P{a,>0} = P{H#,} = 4. 


For two variables x, and x, we have the well-known result of Sheppard (1898), 


-1 
P{xr,> 0, 2,>0} = P{H, E,} = 5[ ad ; 

where /,, is the correlation between x, and 2,. This result is simply achieved by an appeal to geometry 
as is also the result for three variables. With three variables the probability ellipsoid can be transformed 
into a sphere with the original co-ordinate planes becoming three planes through the origin. It is easy 
to show for three (or more) dimensions that the angle between the ith and jth transformed co-ordinate 
planes is just cos~!p,;, where p;; is the correlation between the original variables x; and x;. The required 
probability for three (or more) variables is just the solid angle in three or more dimensions when the 
angles between the planes containing the solid angle are given. An exact expression for this solid angle 
appears to be known for three dimensions only. 


3. A curious recurrence relation for the required probabilities is possible. This relation has its 
analogue in many branches of mathematics; we may quote, for example, any text-book on multi- 
dimensional geometry. It was also partially explored in relation to an n-dimensional integral by 
Schaefli (1857). The theorem usually ascribed to Boole is as follows: 


P(E, +2£,+...+£,} = UP{E}-— XD P{E;E}+ XL P{H,H,H}—...+(—1)"" P{L,#£,... E,}. 
i i<j i<j<l 
Now for the problem considered 
P(E, +E£,+...+H,} = 1—P{#,E,...H,} = 1—P{E,E,...E,}, 
by the symmetry of the multivariate normal distribution. Hence for n odd we have 
P{E,E,...E£,} = i1—-UP{E}+ =X P{H, Hj}... ete.], 
a i<j 


the last term on the right-hand side being the sum of the joint probabilities of the E’s taken n—1 at 
a time. The result for n = 3 is immediate: 


1 3. 3 cos—'p,, cos! cos~} 
era = [2 313” on sig sx | 





1 
* [2m —cos-! py, — cos! p13 — Cos! pag]. 
For n even we get an identity. 


4. From the recurrence relation we can deduce the results for n = 5 given that for n = 4, but it does 
not seem possible to obtain a recurrence relation which will enable us to make the step from 3 to 4. 
Moran, following Kendall, gave an expansion for P{H, 2, E,E,}, but experience has shown that this 
does not converge fast enough for practical purposes except for p,;; small. For example, for p;,; in the 
neighbourhood of 0-5 at least twenty sets of terms are necessary to obtain reasonable accuracy. Some 
work has been done towards obtaining a series expansion for p;; nearly 0-5 in terms of (p,;;— 0-5), but 
only in certain restricted cases. What is required is a quickly convergent series for p,; large and n even. 
From n = 4 we deduce n = 5, calculate n = 6 from the series, deduce n = 7 and so on. The problem is 
difficult but not unrewarding. 
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A graphical method for the analysis of statistical distributions 
into two normal components 


By ERIC J. PRESTON, Peterhouse, University of Cambridge 


1. INTRODUCTION 


Many frequency distributions occur in statistical practice which are probably due to the existence of 
two or more separate sub-universes within the universe under consideration, each of which is of approxi- 
mately Normal form; for instance, the heights or weights of English men and women, or the intelligence 
quotients of professional and artisan engineers would probably have this property. If the sub-universes 
have different mean values, different variances and contain different numbers of individuals, a great 
variety of composite distributions will result. The existence of sub-universes is not usually obvious from 
the bimodal or multi-modal appearance of a frequency curve, since separate modes do not appear unless 
the separation of the means is considerable; it needs to be about three times the standard deviation of 
the components, if they are of comparable size. The following problem therefore arises: given a sample 
(preferably of several thousand individuals), selected from a universe suspected for some reason of 
having such components, can we determine the most probable nature of the sub-universes? Perhaps 
most important, is there a quick practical way of doing so? 

The ‘method of moments’ was first used in a theoretical solution of the problem by K. Pearson (1894). 
Since then, the more efficient ‘method of maximum likelihood’ has been developed by R. A. Fisher and 
others, and applied in particular to this problem by C. R. Rao (1948). Rao also gives, in the same paper, 
a rapid and elegant theoretical solution, by the method of moments, for the case of two components 
assumed to have equal variances. This depends upon the solution of a cubic equation, and appears to 
yield quite accurate results. 

However, it seems to us that all these contributions suffer from over-complexity and the necessity 
for lengthy calculations. The methods do indeed give the most accurate results possible from the 
available sample, but this degree of accuracy seems rather unnecessary in view of the unavoidable 
inherent errors due to sampling fluctuations and to the assumptions that the true components are normal 
and have equal variances. Thus there seems to be room for a more rapid, and much less laborious, 
graphical method. 

This paper deals with such a method for the simple case of two components assumed to have equal 
variances; the variances seem to be very often approximately equal in practice and the assumption 
permits a much more immediate solution of the problem. The method is based on the method of moments 
and the use of a ‘skewness-kurtosis’ diagram. 


2. NorTaTIon 


Let the means of the two component normal populations be yw, and yz, and let o be their common 
standard deviation. Let the subscript 1 be applied to the larger component, and let p be the ratio of the 
number of individuals in the first component to those in the second, so that p> 1. Write é = (u.—4,)/0 
and denote the cumulants of the composite distribution by k,, K,, etc., where x, is the distance of the 
mean from some convenient origin. Then it may be shown that 

Ky = (Pf +H2)/(P +1), 

K, = O%A{(p + 1)? +p%}/(p + 1)", 

Kg = op(p— 1) /(p + 1)°, 

K, = o'p(p?— 4p + 1) d4/(p + 1)4. 


— 1p, — Kt —- Pl0- 1) 
Hence % = VAL = a= espe’ (1) 
— 7.9 — Ke Ale? — 40 + 1)68 
Tee a 8 8 pdt (9+ DE sis 


3. THe ‘SKEWNESS-KURTOSIS’ DIAGRAM 


The accompanying diagram shows systems of contours for constant p and constant é plotted with y, and 
Y2 a8 rectangular co-ordinates. If y,<0, the chart may be entered with | y,| and 7, as co-ordinates, 
the values of 6 being given negative signs. Thus if y, (and «;) is positive, the larger component has the 
lower mean, while if y, <0 the reverse is the case. 
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Fig. 1. Skewness-kurtosis (y,,7,) chart for distributions with two normal 
components having equal variance. 


The lines form a network within the ‘bounding parabola’ y{=y,+ 2, so that in this region values of 
p and 6 corresponding to any point exist, are unique, and may be estimated. The ‘bounding parabola’ 
corresponds to infinite separation of the components and points inside it satisfy the condition 


K+ 2x3 —K3/K,>0, 


satisfied by all real frequency distributions, so that no real points can lie outside the parabola. 

Other interesting properties of the diagram, most of which are of purely theoretical importance, are 
outlined here: : 

If the sub-universes are equal in size, p = 1, y, = 0 and y, is negative; the distribution is symmetrical 
and rather platykurtic, as is also obvious from consideration of the shape of the frequency curve. With 
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unequal components there is skewness, and platy- or leptokurtosis (i.e. the excess kurtosis is negative 
or positive) according as p is less or greater than (2+./3). If p = (2+./3) there is no excess kurtosis for 
any separation. As p becomes large, y, and 7, both decrease towards zero, and with any finite separation 
tend to the limit zero as p tends to infinity (with infinite separation, y, diverges to infinity). With zaro 
separation, trivially, both y, and , are zero for all values of p- The curves of constant separation are 
a rae of cardioids, and with large values of p, near the origin, closely approach the straight lines 
= 7,6. The curves of constant ratio p are quartic parabolas, y{/y3; = p(p — 1)*/(p* — 4p + 1) (constant). 


4, UsE OF THE DIAGRAM 


To estimate p and é by the method of moments when we have available a sample of n individuals from the 
composite population, we obtain in the usual way from the k-statistics estimates g, and g, of y, and yx. 
These may be used as co-ordinates in the chart to obtain approximate values r and d for p and 6, by 
interpolating between the contours. An estimate, s, of g may then be obtained from 


8 = (r+1) Jkg/{(r + 1)* +10}, (3) 
while the estimates m, and m, of , and p, are given by 
m, = k,—ds/(r+1), m, =k, +rde/(r+1). (4) 


Here k, and k, are the first two k-statistics of the composite sample. 


5. PRACTICAL USE OF THE METHOD 


We take as a first example the data used by Rao (1948) for the same purpose, viz. the heights in centi- 
metres of 454 plants, shown in Table 1. 


Table 1. Heights of 454 plants in centimetres 
Central value 8 9 10 11 12 13 14 156 16 17 #18 19 20 Total 
Frequency 3 9 21 40 59 76 79 69 46 30 13 7 2 4654 
The k-statistics (calculated by Rao) are: 
k, = —0-244 (about 14 as origin), k,= 0-729, 
k, = 4-976, k, = — 5-315. 
Hence 9, = 0-066, g, = —0-215. 
Using these values in the chart, we obtain the estimates 
r=14, d=1-4, 


From equation (3), 8 = 1-836. 
Frorn equations (4), 
mM, = — 0°244—2-57/2-4 = —1-31, 


My = — 0-244 + 1-4 x 2-57/2-4 = 1-26. 


The chief source of error in these estimates lies in the difficulty of accurate estimation from the 
diagram. Assuming a standard error of + 0-05 in the estimation of both r and d, the standard errors of 
the results due to errors of interpolation may be found from 


05, = (+p), [pr =r/(r+1)) ) 


o*gt 
= Zap pipe 1)*8*07 + 4p%(p + 1)*o4}, 


iP (5) 





on, = ain TA Ti LAP + 1) + PsP oF + ApX(o + 1)*o9)-) 


The results of these calculations, together with the theoretical results of Rao by the methods of moments 
and of maximum likelihood, and the standard errors of the methods due to sampling and assumptions, 
are given in Table 2. 
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Table 2 
Estimated value by method of Standard Standard 
Parameter error of error of 
estimated | Estimate estimation max. 
Max mr from diagram likelihood 
Wkelthood | Moments. | gigram | r=%e= OO) | estimates 
p/(1+p) Pr 0-567 0-583 0-58 0-009 0-161 
Ay mM, — 1-345 — 1-341 — 1-31 0-034 0-049 
be Ms 1-282 1-286 1-26 0-043 0-033 
o 8 1-823 1-816 1-84 0-021 0-049 
| 


























The reading errors are rather large, but could be reduced if necessary by constructing a large number 
of contours on as large a scale as was convenient. Also, in some cases the errors would be greater owing 
to the shape of the diagram, but p and é are usually small (p< 5, 1<|6|<3) and can then be estimated 
with considerable accuracy. 

A second example may be taken from the distributions of length of trypanosome strains analysed in 
some detail by K. Pearson (1914). The second column of Table 3 shows the observed frequency distribu- 
tion of the length (in microns) of 1000 individuals of Trypanosoma gambiense as recorded by Pearson. 
The moments and moment ratios of the distribution, using our present notation, are given by Pearson 
(p. 131)* as 

ky = 22-1130, g, = 0-5361, 


k, = 14-3389, g, = —0-4156. 
Entering our chart with these values of g, and g,, we read off the estimates of p and é as 


r=2-70, d= 3-15. 
Hence we find 
8 = 2-202, m, = 20-238, m, = 27-175, 


Pp, = 7r/(1+r) = 0-730, p, = 1—p, = 0-270. 


The expected frequencies, based on the two normal components with a common standard deviation 
estimated by s, are shown in the third column of Table 3 and the differences (observed — theory) in the 
fourth. After combining four groups in each tail, it is found that x? = 15-28, with 13 degrees of freedom. 
The probability of exceeding this value if our theoretical structure were correct is 0-29. The fit is quite 
good, but it is evident that there are certain systematic differences between theory and observation. 

K. Pearson (1914, p. 131) did not assume a common standard deviation for the two components and 
used his general moment method of fitting (1894) involving the solution of a nonic. He found 


Mm, = 19-8926, 3, = 2-0566; M, = 26-2463, 


P, = 0°6505, p, = 0-3495. 


Using these estimates of the parameters, the theoretical distribution given in the fifth column of Table 3 
is obtained, with the differences (observed — theory) of the sixth column. In this case x? = 10-77 with 
12 degrees of freedom, a value exceeded by chance with a probability of 0-55. The fit has been improved 
by allowing the two variances to differ, but there is still evidence of some systematic discrepancy, as 
shown by the run of the signs of differences. 


8, = 2-6260; 


6. SuMMARY 


This graphical method of analysis into two components by means of the skewness-kurtosis diagram may 
be useful in many cases where a simple, rapid determination of the probable nature of the two sub- 
universes is wanted. It is less efficient and less accurate than existing theoretical methods, but there 
are always unavoidable errors due to sampling and necessary assumptions about the components. It is 


* Pearson gives values of moments, with n as divisor, not k-statistics, but with n = 1000 the slight 
adjustment called for seemed unnecessary. 
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therefore likely to be of particular value, in contrast to these methods, in producing quick approximations 
to the components of small samples. Any kind of distribution may be analysed by this method into two 
normal components, irrespective of whether its character is, or is not, due to their physical existence, 
since all real points on the diagram lie within the bounding parabola; hence the method cannot offer 
any confirmation of their physical existence. It may, however, permit us to devise physical and experi- 
mental tests to identify their physical nature if their existence seems probable. 











7 
Table 3 : 
s 
Distribution Pearson’s ] 
Central Z en trae fitted by Difference composite Difference t 
value ae. pinid diagram distribution c 
12 _ 0-2 0-1 
13 1 0-6 0-5 
14 720 9.5 [ Li —1-4 9.9 { 10°6 —0°6 
15 9 8-1 7-8 
16 21 21-2 —0-2 21-6 — 0-6 
17 56 45°3 10-7 47-4 8-6 
18 79 78-9 0-1 82-9 —3-9 
19 114 112-3 1-7 115-1 -—1-1 
20 122 130-5 — 8:5 128-0 —6-0 
21 110 124-6 — 14:6 115-8 — 5-8 
22 85 98-9 — 13-9 89-2 —4-2 
23 85 68-8 16-2 65-6 19-4 
24 61 48:7 12-3 54:5 6-5 
25 47 43-3 3-7 53-3 — 6:3 
26 49 46-7 2-3 54:3 —53 
27 47 49-7 —2-7 51-0 —40 
28 44 45-6 —16 42-4 1-6 
29 31 34-8 — 3-8 30-7 0-3 
30 20 21-6 —16 19-2 0-8 
31 11 11-0 0:0 10-5 0:5 
32 4 46 49 
33 4 1-6 2-0 
34 = 8 0-4 6-7 1 3 0-7 7-9 0-1 
35 — 0-1 0-1 
Total 1000 1000 1000 


























I wish to record my sincere thanks to the Editors of Biometrika for their most generous and helpful 
advice and assistance in preparing the final draft of this paper; to my brother, Dr F. W. Preston, of 
Butler, Pa., U.S.A., through whose interest and at whose suggestion I originally began investigation of 
the subject; and to his wife Jane, and to Mr R. R. Lehnerd, of the Preston Laboratories, Butler, for 
their care and patience in making legible the original script and diagrams. 


REFERENCES 


Pearson, K. (1894). Phil. Trans. A, 185, 71. 
Pearson, K. (1914). Biometrika, 10, 85. 
Rao, C. R. (1948). J.R. Statist. Soc. B, 10, 166. 











Miscellanea 465 
A note on regions for tests of kurtosis 


By G. E. P. BOX 
Imperial Chemical Industries, Dyestuffs Division Headquarters, Blackley, Manchester 9 


The most coramon test criterion for kurtosis is b,, the sample fourth moment divided by the fourth 
power of the standard deviation; an alternative criterion proposed by Geary (1935) and denoted by 
a is the ratio of the sample mean deviation to the sample standard deviation. The present author has 
shown in @ paper appearing on p. 318 of this volume that Bartlett’s (1937) modification of the Neyman- 
Pearson I, criterion (1931) for testing for the equality of variances is so sensitive to kurtosis that when 
there is only one observation per group (group means assumed known) this criterion denoted by, is 
of the same order of sensitivity to kurtosis as is b, or a. 




















(ii) 


Illustration of symmetry conditions when N=3 




















(iii) Rectangular (iv) Normal (v) Double exponential 


Distributions and probability contours showing directions of accumulation of probability density 


Fig. 1. Geometrical properties of tests of kurtosis. 


A better understanding of the relations which the various tests for kurtosis have to one another and 
to M_, is gained by considering the general nature of critical regions for such tests. For simplicity we 
shall need to assume that the mean is known and equal to zero. We notice first that any criterion to 
detect kurtosis must be independent of scale; if we reach a certain conclusion from a sample y;, Ya, «++, Y 
we must reach the same conclusion from the sample ky,, ky2,.-.,kyy. It follows that the boundary of 
the critical region in the sample space consists of one or more cones with apices at the origin. Also each 
observation has equal weight so that if the point y,, yo, ..., yy is on the boundary of a critical region then 
so must the remaining N!—1 points obtained by permuting the order of the values. Fina!ly, since 
a test of kurtosis should detect departures from normality other than those due to asymmetry of the 
distribution, the test criterion must be independent of the signs of the observations. Thus the pattern 
of critical subregions in the region of the space in which the observations are all positive will be repeated 
in all the remaining 2% — 1 regions of the space generated by possible differences in sign. The conditions 
of symmetry thus imposed are illustrated for N = 3 in Fig. 1 (i) and (ii) which shows cross-sections of 
subsets of cones satisfying these conditions. When the null-hypothesis of normality is true, the probability 
density in the sample space is constant over the surfaces of N-dimensional hyperspheres with centre at 
the origin. Consequently if the hypercones of the critical region are such that they include a proportion 
a of the total surface of such hyperspheres the error of the first kind is controlled at the level « irrespective 
of the value of the scale parameter. 
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In addition to controlling this error, we require the test to pick out as often as possible a departure 
from normal-kurtosis when it occurs. Consider the probability contours corresponding to symmetrical 
distributions having varying amounts of kurtosis in the case N = 3. For the normal distribution the 
contours are spheres (Fig. 1(iv)). As y, falls below zero, the contours tend to belly out and for the 
rectangular distribution (vy, = — 1-2) the single probability contour is the surface of a cube (Fig. 1 (iii)), 
the density within the cube being uniform. As yy, is made larger than zero the contours tend to sag 


Sections at right angles to 
equiangular lines co-ordinate axes 


(i) 
e<c (1-2) ¢>c (a) 
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Fig. 2. Sections of critical regions. 


inwards till for the double exponential distribution (y, = 3) the contours are surfaces of regular octahedra 
(Fig. 1(v)). In general for platykurtic distributions (y,<0) there is an accumulation of probability 
density along the equiangular lines (and a depletion along the co-ordinate axes), whilst for leptokurtic 
distributions (y > 0) there is an accumulation of probability density along the co-ordinate axes (and a 
depletion in the direction of the equiangular lines). It follows that (i) a good critical region for testing for 
platykurtosis would bethe 2% hypercones having apices at the centre and axes along the 2% equiangular 
lines, whilst (ii) a good critical region for testing for leptokurtosis would consist of the 2N hypercones 
having their apices along the 2N branches of the co-ordinate axes. 
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The directions of the axes of cones forming such regions when N = 3 are shown by arrows in Fig. 
1 (iii) and (v) respectively. In particular b, and a have critical regions of these types and so has M . 
In each case, the particular type which occurs depends on whether we are using the criterion to test for 
leptokurtosis or for platykurtosis. The regions differ only in the shapes of their cross-section and even 
these differences are limited to some extent by the symmetry conditions listed above. For the case 
N = 3, sections of cones corresponding to the approximate upper and lower 5 % levels for the test criteria 
b,,a@ and M , are shown in Fig. 2 (ii), (iii) and (iv). (In order to make clear that the region in Fig. 2 (ii) 
defined by b, > b,(~) has a non-circular section, it has been necessary slightly to exaggerate the departure 
from circularity.) 

2. It is of some interest to consider kurtosis tests from the point of view of the Neyman-Pearson 
theory of best critical regions. Suppose that the distribution specified by the alternative hypothesis is 
a member of the family 


Ply) = (*? r(-)} exp-(¥\ (—o<y<+0,0<9<0,0<p<o), (1) 


and q ts given. This is a symmetrical distribution with mean zero and 


T'(3/q) a T(5/q) T(1/g) | 
T(1/q)’ ** {T'(3/q)}* 
When q = 2, y, is zero and the distribution is normal. When g> 2 the distribution is platykurtic and, in 
particular, when g tends to infinity the distribution tends to the rectangular, for (1) tends to 
= 1/(2p) (y<p) 
Ply) (3) 
=0 (y>p). 
When g<2 the distributions are leptokurtic in particular when g = 1 the distribution is the double 
exponential. 
The likelihood ratio test criterion (Neyman & Pearson, 1928) for testing the null hypothesis of 
normality (q¢ = 2) against the single alternative that the distribution is of the form of the above equation 
with qg equal to some other specific value q, is readily found to be proportional to 





K, = p* 3. (2) 


Ag, _ (m,,)"/8, (4) 

where m, is the qth absolute moment, 
m,=X|y|*/N, and in particular s = mj. (5) 
The inequality Ag, < Ag(1—2), (6) 


where A(1—«) and A(a) refer respectively to the lower and upper significance points of the criterion, 
defines a set of cones lying along the co-ordinate axes when g,<2 and a set of cones lying along the 
equiangular lines when g,>2. That every such set defines a best critical region for the appropriate value 
of g, can be seen from the fact that on every* hyperspherical shell (4) and (6) define the region, inde- 
pendent of scale factors in the null distribution or in the alternative distribution, the boundary of which 
is given by the fundamental inequality p,> kp, of Neyman & Pearson (1933). This region on every such 
shell thus contains for its size the greatest possible concentration of probability density when the 
appropriate alternative hypothesis is true. 

We see that, a<a(1—«) defines the most powerful test when the alternative distribution is the double 
exponential, for which y, = 3, also 6,<6,(1—«) defines the most powerful test when the alternative 
distribution is p(y) = const. exp—(y/p)* for which y, = — 0-812. When the alternative distribution is 
rectangular (y, = — 1:2,q = 00) the test is defined by c<c(1—«a), where 


e=|yz|/s (7) 
and yy, is the observation largest in absolute magnitude. It should be noted that with this criterion we 
should be testing whether the largest deviate in absolute magnitude were too small 2nd the test would 
be based on the lower tail area of the distribution of c. The same «riterion, but using the upper tail area of 


* This is literally true if it is assumed as in (1) that gy<0o. In the limiting case when gp = 00, if 
we suppose a particular rectangular distribution of semi-range p to be the alternative and if r is the 
radius of a hemispherical shell, then the region defined on the shell will contain an amount of probability 
density greater than any other region of similar size if p<r<./Np. For all other values of r it will 
contain an amount of probability density equal to that of other regions of the same size. The test is 
thus still most powerful in this limiting case. 
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c, has been proposed by Thompson (1935) as a test for a single outlier (see also Pearson & Chandra Sekar, 
1936). Sections of critical regions for this criterion are shown in Fig. 2(i). It will be noted that only the 
lower tail areas of c, b, and a define most powerful tests for the alternatives considered. In Fig. 2 the 
regions are differentiated by drawing in the axes of symmetry only for those sections which correspond 
to most powerful tests. We should expect c and b, to be particularly good criteria, therefore, when the 
alternative hypothesis was that the distribution showed marked platykurtosis; on the other hand, in 
testing for marked leptokurtosis a would be expected to be better. Pearson’s (1935) sampling experiments 
(his Table 7) which are shown again in table 4 of the article appearing on p. 318 of this journal 
support this suggestion {although with so few results no firm conclusions can be drawn). For the rect- 
angular distribution a larger number of significant results is obtained with b,, whilst for the double 
exponential a larger number is obtained with a. It does not of course follow that a would become better 
than b, as soon as y, was greater than zero; the change-over might well be at some other point. 

In a study of possible tests of kurtosis of this type, and assuming the alternative distribution to be 
the symmetrical Gram-Charlier, Geary (1947) reached the conclusion that b, was the most ‘efficient’ 
test for infinitely large samples (and correspondingly small departures from normality). He also showed 
that, over a wide range of values of q, the power of the various possible tests would not be expected to be 
greatly different. The present research emphasizes that the choice of criterion for samples of finite size 
must depend on the type of alternative hypothesis which is in mind. The similarity of the critical regions 
for the various criteria confirm Geary’s second conclusion. 
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The frequency justification of sequential tests—addendum 
By G. A. BARNARD, Imperial College 


In my paper (1952) on the above topic, at the end of §3 (p. 147) an inversion of limit operations was 
performed without explicit justification. This has caused difficulty for some readers, and a justification 
for this process is now appended. 

The omission of this proof in the first place was quite deliberate. In the opinion of the writer, statistics 
is a branch of applied mathematics in which undue attention to pure rigour is out of place. In other 
branches of applied mathematics, in mechanics, for example, one is expected to make use of experience 
to fill in what would otherwise be gaps in proofs. If a plane lamina is under consideration, and its surface 
density is given, one does not usually find the assumption explicitly mentioned that the lamina is 
@ measurable set of points before use is made of the fact that the lamina will have a determinate mass. 
But in certain quarters great emphasis seems to be laid on just such assumptions in statistics. 

It is not that the present writer is opposed to purely mathematical arguments. It is, that such 
arguments should be kept in their proper place, and that we should not pretend we are doing statistics 
when we are really doing nothing but pure mathematics. 

In §5 of the paper, it is stated that ‘Jeffreys has not ... laid down general principles which determine 
the choice of prior distributions in cases of “‘ignorance”’’. This should read ‘...which uniquely determine 
the choice...’. 

We now give the proof referred to above. 

We wish to show that if 


b 
(a) (1/log/a){ a"(o,a,b)da/o =a’ forall0<a<b<oa 
a 


and (5) as aU and 00, a’(a,a,b) >a, 
then a = a’, 





ky 


a a ie 
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We notice first that a’(o,a,b) = «"(co,a/c,b/c) for any positive c. For if we make the changes 
X—>cX, a>a/c, b+>b/c, then o>co, 8->cs, sa, sb, and ¢ remain unaltered, and so A,(a,b) remains 
unaltered. It follows that we can write «”(o,a,b) = a”(aa, ab). 

We now put ¢t = loga, l, = loga, 1, = logb, in the integral, and put «”(ca,ob) = f(¢+1,,¢+1,). Then 
(a) becomes 


1; 
cayt,—a) f f(t+1,,t+1,) dt = a’ for all l, <1,, 
4 


while (6) gives f(z, y) >a as x ->— oo and y >+ 0. The second condition implies that, given any positive 
€, we can find an A such that 


| f(x,y) —a | <e€ whenever x< —A and y>A. 


Now in (a), put 1, = —nA, 1, = +nA, with n>1. Then we have 


+(n—1)A 


—(n—-1) 4 A 
a = (1/2nA){ { fit—nd,t+nayats [| fim, t+nd) dt |” 
—nA 


—(n-—1)A (n—1) A 
= (1/2nA) {Af(t, —nA, t, +nA)+2(n—1) Af(t,—nA,t,+nA)+Af(t,;—nA,t3+nA)}, 
where —nA<t,<—(n—1)A, -—(n—1)A<t,<(n—1)A and (n—1)A<t,<nA, 


fit—nA,t+n) at} 


by the mean-value theorem. Now f is a probability, and is therefore bounded, so that as n -> 00, the first 
and last terms within the braces remain bounded. Hence, by taking n large enough, we can make @’ 
differ by less than ¢ from f(t, —-nA,t,+nA). But t,—nA< —A andt,+nA>A, so that f(t,—nA,t,+nA) 
differs by less than ¢ from a. Hence a and a’ differ by less than 2e. 
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REVIEWS 


The Design and Analysis of Experiments. By 0. Kemptnorne. New York: John 
Wiley and Sons; London: Chapman and Hall. 1952. Pp. xix+631. 76s. 


Until two or three years ago there were few connected and comprehensive accounts of the theory and 
practice of experimental design. There were the classical monographs by Fisher and Yates and a vast 
journal literature on subsequent developments. To the statistician outside the field of agricultural 
experimentation, the ingenuity shown in devising more and more refined and specialized designs was 
indeed impressive, but the resulting complexity became, it must be admitted, somewhat baffling. 

There was clearly a need for a book which would give a unified account of all the designs, their 
advantages and disadvantages and their relation to general experimental principles, and which would 
connect all this with modern statistical theory. As is inevitable when a gap in the literature is disclosed, 
not one but several books were written to fill it. Quite apart from those recent statistical text-books 
which have given prominence to experimental design (among which that by Bancroft and Anderson is 
a notable example) at least four books entirely devoted to this subject have appeared in the last four 
years. (It is regrettable, incidentally, that their titles are virtually identical.) The first, by Mann 
(Analysis and Design of Experiments), concerns itself strictly with the mathematics underlying the 
analysis of different designs. The second is the well-known volume by Cochran and Cox. The third, by 
Kempthorne, is the one under review and the fourth, by Quenouille (The Design and Analysis of 
Experiment), only appeared in the United Kingdom in early 1953. It is notable, though in no way 
surprising, that three of these books are written (or partly written) by statisticians who have worked 
at Rothamsted. 

As might be expected with two books from the same publisher, the books by Cochran and Cox and 
by Kempthorne differ in approach. Professor Kempthorne says that ‘the basic requirements of the 
user of experimental designs have been satisfied by the book of Cochran and Cox....The aim of this 
book has been to give a description of the design of experiments from as broad a view as possible and 
to relate the subject matter of this field of statistics to the general theory of statistics and to the general 
problem of experimental inference.’ This should not be taken to mean that Cochran and Cox offer no 
guidance on general principles of design or on the advantages and otherwise of different designs, for 
they are useful and clear on both; but their book is a manual of experimental design, that by Kempthorne 
is an advanced text-book. It is the former which will be of greater service to most practical experi- 
mentalists; the latter which, by and large, will be of more interest to the theoretical statistician, 
whether as consultant or teacher. There is something to be said for using the two books in conjunction. 
Possibly because its approach is more difficult, Kempthorne’s text does not always attain the clarity 
of exposition which distinguishes Cochran and Cox’s book, and reference from one to the other often 
helps. In any case, this new book is not for the beginner, nor is it suitable for superficial and easy 
reference. It should be looked upon as a comprehensive and advanced text-book and studied with the 
concentration and patience such a book deserves. 

Its comprehensiveness is indeed remarkable and Professor Kempthorne must be congratulated for 
carrying through so successfully the enormous task of providing a systematic treatment of experi- 
mental designs. He begins with six more or less introductory chapters, the first two on experimental 
principles, the next on elementary statistical notions (the value of this chapter is open to doubt; it is 
too scanty for those without sufficient background and superfluous for others) and three on least 
squares and general linear hypothesis theory. Chapters 7 and 8 between them give an admirable 
treatment of randomization. Chapters 9 and 10 deal with the two basic designs, randomized blocks 
and latin squares, and are followed by one on plot technique. Chapter 12 is on the sensitivity of 
experiments, a subject which has had relatively little attention in other texts. Then follows the core 
of the book, namely, nine chapters dealing with factorial designs. After this come four chapters on 
lattice designs, two on incomplete block designs, one on the analysis of groups of experiments and 
a final one, chapter 29, on treatments applied in sequence. The mathematical theory is given for all 
the designs, together with explanations and numerical examples. Judged by any standard, the book 
is a fine achievement. 

As there will undoubtedly be future editions, it may be worth mentioning one general aspect which 
could be improved. Of some subjects treated by Professor Kempthorne one has the impression, not so 
much of a unified treatment, as of a series of notes scattered throughout the volume. The analysis of 
covariance is a case in point. It is first introduced in § 6-6 as a modification to the general two-way 
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model; the mathematics for it are given at length, but no explanation is given of its purpose or use. 
This comes later, first in § 8-7 in the chapter on The Validity of Analyses of Randomized Experiments 
and then in § 9-9 in the chapter on Randomized Blocks. (The index does not refer the reader to § 8-7). 
A certain amount of rearrangement on this, as on some other topics, would clarify the presenta- 
tion. The very first opportunity should also be taken for improving the index and the cross-referencing. 
The present index is quite inadequate and there is virtually no cross-referencing in the text. 

The theory and practice of experimental design has grown so much out of agricultural experimenta- 
tion that it is inevitable that the discussion and illustrations in this book refer almost wholly to this 
field. At several points, Professor Kempthorne remarks critically on methods used in the social sciences. 
It is of course true that experimentation in this field is very much more difficult than in agriculture. 
But it is not impossible nor are survey techniques as limited in their value as Professor Kempthorne 
suggests, in passing, in his first chapter. In the last year or two, attempts have been made to use 
experimental designs in investigations dealing with human populations and something has been learned 
about the difficulties and possibilities. For the statistician working in the social sciences, one of the 
most interesting problems is, to what extent experimental techniques, such as those discussed in this 
book, can be imported into his domain of study. For him, as for his statistical colleagues in other 
sciences and fields of activity, Professor Kempthorne’s book is a most welcome addition to the 
literature. 

Cc. A. MOSER 
London School of Economics 


Associated Measurements. By M. H. QuENoUILLE. Pp. x +242. Butterworth. 35s. 


The Design and Analysis of Experiment. By M. H. QuENouUILLE. Pp. xiii+356. 
Griffin. 36s. 


To have produced these two books within a comparatively short time is a notable achievement. The 
books have much in common; they are advanced non-mathematical text-books on statistical methods 
excellently illustrated with examples, mostly biological, which are handled with great skill and with 
a clear sense of the practical purpose of the statistical analysis. The main fault is an occasional 
carelessness of writing. 

The discussion is mostly in terms of specific examples and, while the account of these is remarkably 
clear, the reader without a sound background of theory may find it difficult to apply the more advanced 
methods to eases slightly different from those considered by Mr Quenouille. For example, the method 
of least squares, which underlies much of both books, is, so far as I could see, never mentioned and is 
certainly never explained. It is true that the stage-by-stage fitting of groups of constants in a least- 
squares model is often most expeditiously done by analysis of covariance, and that this technique is 
used repeatedly. But a statement in some of the more complicated cases of the form that is assumed for 
the underlying population would have given a unity which is at present lacking. An excellent point 
about both books is the care taken to explain the assumptions necessary for the validity of the various 
methods. 

Associated Measurements starts with a very good section on graphical analysis, including an account 
of non-parametric tests for association. The description here of tests of partial correlation is open to 
criticism. The author defines a coefficient of medial correlation between two variates x and y by 
dichotomizing the scatter diagram at the medians, finding the proportion, p, of observations in the 
positive quadrant, and putting re ae (1) 

a= x 


For an infinite population ¢,, takes values plus and minus unity for perfect correlation, and zero for 
independence. Also when x and y are independent, the distribution of ¢,, does not depend on the form 
of the population, so that a non-parametric test of independence may be obtained. For a bivariate 
normal population ¢,, is related to the product-moment correlation coefficient, p,,, by W. F. 
Sheppard’s formula nd 


. av . (2) 
Mr Quenouille next defines a par tial medial correlation coefficient by 


foes Pov a Perrys 
bose GEE) ms 
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No derivation of this formula is given, and although it is used consistently, one suspects that it is 
@ misprint for 
Pov—Pasys 


ag) a” (4) 


derived by analogy with product-moment correlations. Both (3) and (4) can give quite misleading 
results. To see this it is enough to consider some particular trivariate normal populations. For example, 
suppose that «=y+z, where y and z are independently distributed in unit normal distributions. Then, 
conditionally on z, x is perfectly correlated with y. But ¢,,.,.=¢.,—4 and $3,,,=1/./3, so that the 
partial ¢’s suggest that the correlation is unchanged, or only slightly improved, by holding z fixed. 
The situation is more satisfactory if, conditionally on z, x is independent of y. For, in this case, $y. 
and ¢4,,,, while not necessarily zero, are always appreciably smaller than ¢,,. However the vanishing 
of $,,,, and $4,., does not imply that p,,., is small. For example if p,,=0-696, p,,=p,,=0-891, then 
Pey.2= —0°476, although ¢,,..=2y..=0. Also by taking p,,, etc. near ‘unity, p,,,, may be made 
arbitrarily near minus one, with ¢,,.,=$%,..=0. Enough has been said to make it clear that tho 
uncritical use of partial ¢ could be very misleading. 

Similar remarks may, incidentally, be made about the coefficient of partial rank correlation given by 
Prof. M. G. Kendall in his Rank Correlation Methods. Although the theory of rankings derived by 
sampling an underlying normal population is only a very special case of the general theory of ranking, 
it is worth noting that in such cases Kendall’s partial 7 may give a wrong idea of the form of the 
continuous population. The similarity with ¢ arises because 7, the expectation of 7 in samples from 
& normal universe, is given by the analogue of (2), 


p=sin (F). (5) 


When the underlying population is normal these difficulties may of course be avoided by trans- 
forming ¢ or 7 by (2) or (5) before calculating the partial correlation. There are a number of objections 
to this as a general procedure. 

The second section of Associated Measurements is called Numerical Analysis and contains a brief 
account of analysis of variance, analysis of covariance and regression analysis. The next section, on 
Rapid Estimation and Analysis, includes chapters on the quick estimation of regressions by the coarse 
grouping of the observations, and on further uses of analysis of covariance. The last three chapters 
are entitled Analytical Complications, Time Series Analysis and Multivariate Analysis. The chapter 
on time series deals mainly with useful approximate methods based on correlation coefficients and 
analysis of covariance. The final chapter on multivariate analysis seems to me one of the best brief 
introductions to the subject at present available. 

The Design and Analysis of Hxperiment begins with a twenty-page account of the general principles 
of design. This is so terse that it is unlikely to make much appeal to an experimenter not already fairly 
familiar with the methods described. For example there is a bald statement that randomization is 
essential if unbiased estimates and unbiased estimates of error are to be obtained, but there is no 
attempt to explain in non-statistical language just what the consequences are of failure to randomize. 

The following chapters deal with randomized blocks, Latin squares, and with simple and general 
factorial and split-plot designs. The emphasis tends to be on the appropriate forms of analysis of 
variance and covariance. 

Section B consists of four chapters on incomplete block designs. The main types of confounding 
and fractional replication are well described. The advice (p. 111) that, except in special cases, the 
interactions to be confounded should be chosen at random from interactions of a given order, is 
surprising. Such a randomization is in any case quite pointless, and is dangerous in that it tends to 
distract attention from the great importance of a careful choice of the interactions to be confounded. 
The fairly general condemnation of partial confounding (p. 117), because of the complexity of the 
calculations involved in the analysis, is also open to question. 

Sections C and D entitled Long-term Policy and Experimental Complications contain a good deal 
of interesting material, the discussion of the combination of results from a series of experiments being 
particularly good. 

To sum up, while these books should be read critically, they are very useful and should appeal to 
anyone wanting a non-mathematical account of advanced statistical methods. 





Piy.s = 


D. R. COX 
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The Statistics of Bio-assay. By C. I. Buiss. New York: Academic Press, Inc. 
Pp. iiti+ 184. $3.50. 


Statistical Method in Biological Assay. By D. J. Finney. London: Charles Griffin 
and Co. Ltd. Pp. xix +661. 68s. 


Probit Analysis. By D. J. Finnry. Cambridge University Press. 2nd ed., pp. xiv + 318 
35s. 


The statistical theory of bio-assay has been almost entirely developed during the past 20 years and 
has reached a reasonably stable state, so that the appearance of text-books by two of the foremost 
workers in this field is extremely welcome. The two general texts cover much the same ground, though 
Finney’s book is by far the more detailed; Bliss’s book originally appeared as a single chapter in 
Gyérgy’s Vitamin Methods and is necessarily rather compressed. Both authors deal with parallel line 
assays with quantitative and quantal responses, slope-ratio assays, the use of covariance and the 
combination of potency estimates; Bliss has an interesting section on control charts, while Finney 
includes several topics which do not arise in vitamin work and so fall outside the scope of the shorter 
book. 

Inevitably in so brief a compass, Bliss’s book is little more than a collection of recipes—indeed he 
refers his readers to Finney for all mathematical details. Within this limitation it should prove very 
valuable. The needs of the practising biologist are always kept in mind, and the author is bold enough 
to suggest approximate rule-of-thumb techniques for important problems, rather than to decline to 
commit himself on the grounds that no exact solution is yet available. 

Finney’s book is substantial to the point of being encyclopaedic, and some of the theory set out is 
well ahead of practice, at any rate in the assay field. The reader is supposed to be reasonably familiar 
with the basic statistical techniques, but apart from this, the suggested analytical methods are 
explained in full detail with an excellent set of worked examples. The underlying logic of bio-assay is 
discussed at length, and a number of incomplete block designs are given which should prove of great 
practical use. Four chapters are devoted to quantal assays, and these are noteworthy for a very full 
discussion of the various alternatives to probit analysis that have been suggested from time to time. 
The book includes a 40 page Appendix of Tables, many of them new. 

The choice between these two books is not an easy one. Statisticians will probably prefer Finney. 
Biologists may like to keep Bliss’s book on the bench (not least because of its handy size), but a study 
of Finney’s book will be necessary for non-routine problems, and may well enable them to improve 
on their accepted techniques. 

A second edition of Finney’s Probit Analysis is very welcome, the first edition having been unobtain- 
able for some time. Few changes have been made in the original material, but a substantial new chapter 
on Recent Developments has been added, and some of the tabular matter is new (the table of weighting 
coefficients now covers natural mortality up to 90%!). Alternatives to probit analysis are still rather 
inadequately treated (as the author himself points out), but this has been remedied by the very full 
discussion in the later book. The omission of one or two other topics, such as Tocher’s analysis of 
grouped data and Finney’s own work on graphical methods and the adequacy of a single cycle of 
calculation, seems to be due to a long delay in publication—the preface to the new edition is dated 
December 1949. The standard of production leaves something to be desired; the new edition has been 
produced by an offset-litho process on rather thick paper, and the appearance of the page has suffered 
somewhat. It seems a pity that the price should still be almost double that of the original edition. 
Nevertheless, Probit Analysis remains an essential book for the many biologists and statisticians who 
work with sigmoid curves. M. J. R. HEALY 


Statistical Theory with Engineering Applications. By A. Hatp. New York: 
Wiley; London: Chapman and Hall. 1952. Pp. xii+783. 72s. 

Statistical Tables and Formulas. By A. Hatp. New York: Wiley; London: 
Chapman and Hall. 1952. Pp. 97. 20s. 


This book has as its aim the provision, in simple and logical form, of those parts of statistical methods 
which are of use to the engineer in his daily work. Nevertheless, in its 800 pages it covers considerably 
more ground, and that ground in greater detail, than mjght be expected from this simple aim. The result 
has been to produce a text-book which derives from first principles and illustrates with practical 
examples all the common statistical processes. 
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The book assumes no previous knowledge, and although advanced mathematical techniques are 
avoided the treatment and mode of development is mathematical in form. Nearly all the important 
results derived mathematically are subsequently illustrated practically. The contents include 
probability, graphs, the fundamental distributions, the analysis of variance, sampling investigations, 
the analysis of regressions and sequential analysis. It is noteworthy that the various distributions 
which are studied take up nearly 350 pages, which gives some idea of the thoroughness of the treatment. 
The many practical examples that are provided to illustrate the theory are drawn almost exclusively 
from engineering sources but they are put down in such a way that no detailed knowledge of engineering 
technique is required and the reader could easily substitute his own examples from his particular sphere 
of applications. The only criticism that could be levelled against the examples is that they are all 
chosen to illustrate just one point at a time, and one feels that, at any rate towards the end of the 
book, more complex illustrations should be used bringing in simultaneously a number of points from 
the theoretical development. As it is, few practical problems that the engineer is likely to meet with 
will be as simple as those given in the book. Also given in conjunction with each piece of theory are 
references to original memoirs and papers which describe practical illustrations. The reviewer feels that 
such references, instead of being scattered, would be better gathered together at the end of each 
chapter. One other minor point is that far too many equations are numbered, thereby giving them 
a certain prestige value. For example, in Chapter 18 no less than 157 equations receive numbers and 
few are referred to again nor are they likely to be quoted. 

The particular portion of the book that struck the reviewer as one of the best of its kind was the 
section on the analysis of variance which is developed extremely logically. The hierarchical and cross- 
classification types of problem are discussed in some detail from the view-point of both the systematic 
and the random models with a very lucid explanation of the concept of an interaction. Later the case 
where there is a mixture of the two models is discussed. The technique of the analysis of variance has 
great value for the engineer and it is rather a pity that the treatment here is not quite as full as in other 
parts of the book. Two examples of this may suffice. The analysis of variance is built up on the twin 
assumptions that the residuals have equal variances and are normally distributed. If the former does 
not hold then a transformation can sometimes be applied and although this is mentioned on page 427 
the full implications are not brought home. The latter form of departure would seem at first sight to 
indicate that the usual F or z test will not give the true significance level. It has, however, been shown 
that the non-normality of the residuals does not, in fact, affect the theoretical significance levels very 
much. Nevertheless if this point is not discussed it is inclined to worry the student. Secondly a short 
account of the simpler techniques of probit analysis would have been an extremely useful addition to 
the chapter on linear regression. Perhaps also, in a book of this size, a fuller account of time series 
might have been expected. 

All authors have their own ideas as to the logical order of development of their subject. In this 
book the discontinuous distributions such as the binomial are dealt with in outline only before the 
main part of the book on continuous distributions and they are then returned to in more detail later. 
An alternative scheme, which would appear to have much in its favour, would have been to have dealt 
with all distributions in this manner and then to devote later chapters tq problems of estimation, 
confidence limits, testing hypotheses and so on, bringing in the various distributions already discussed. 

As a companion to the book a paper-covered set of tables has been issued and there are many 
references in one volume to the other. These tables are partly statistical, such as those for the normal 
and ¢ distributions, and those giving confidence limits for the binomial probability, and partly mathe- 
matical, such as those for log n!, square roots and reciprocals. All told there are nineteen tables and 
they cover the work in the book. Viewed in this light they are adequate and are set out in the most 
useful manner once the reader has got used to the word ‘fractile’ for significance level. They are, 
however, not full enough to stand by themselves as a book of statistical tables, especially since the 
introduction provided is only of very limited value without the book at hand as a guide. 

Summarizing, one may classify this book of Hald’s as a text-book, pure and simple. There are no 
discursions into the realms of debatable or controversial issues and hence it is a useful work which can 
be studied with much profit by a person intent on getting a sound working knowledge of statistics 
whether engineer or no. At the same time there is a risk that its length may put off all but the keenest 
student, particularly if he must find the money to buy it himself. 

P. G. MOORE 
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Statistical Decision Functions. By A. Watp. New York: John Wiley and Sons, 
Inc.; London: Chapman and Hall, Ltd. 1950. Pp. ix+179. 44s. 


By the tragic death of Abraham Wald in December 1950 the world of mathematical statistics lost one 
of its most active and ingenious minds. The present book, published just before his death, shows him 
to have been at the height of his powers. 

To explain what a decision function is, and to introduce concretely the leading ideas defined by 
Wald, we will consider the following simple example: 

An urn contains black balls and white balls, and the proportion p of black balls is unknown. A fixed 
number n of balls is drawn, at randorn with replacement, and r of them are observed to be black. 
We are required to make a guess, p’, at the value of p, and if we are wrong we lose an amount of money 
proportional to L=(p—p’)?. 

Any rule by means of whivh we arrive at our guess will be a decision function for this situation. 
Since n is fixed, the only quantity on which our guess can depend is r, but the value of r need not 
uniquely determine our guess; we may make p’ depend also on the toss of a coin or some other random . 
event. Thus examples of decision functions d(r) in this situation are: 


(i) d(r)=r/n, (ii) d(r)=4, 
(iii) d(r) =A(r/n) + w(4), for some A=1—y, 0<A<1, 
(iv) d(r)=r/n with probability 4 

=4 with probability 4. 


In cases (i)—(iii) we take p’=d(r), while in case (iv), which is an example of a randomized decision 
function, we toss a coin, and if it gives heads we take p’=1r/n, while if it gives tails we take p’ =}. 

If we could attach, to each decision function d, a definite monetary loss incurred by its use, we would 
find the problem of choosing between one decision function and another quite simple. We would simply 
choose that function which minimized the loss. But such a simple solution is impossible, for two reasons. 
For one thing, for a given value of p the value of r will vary randomly, and so the guess p’ will vary, 
and the loss L will vary along with p’; and for another thing, the distribution of p’ will in general 
depend on p, and, for a given p’, the loss L depends on p. Wald disposes of the first difficulty by noting 
that, for given p, and given d, the distribution of p’, and hence that of L, is fixed, so that we may 
evaluate the mean value of L as a function of p and d. This function Wald calls the risk function, 
plp, d). 


In the four examples we have given the values of p are 
(i) pg/n, (ii) tpg, (iii) A*pg/n + u*(4— pq), (iv) $—pq(1—(1/n))/2. 


Wald then adopts the principle that it is reasonable to act so as to minimize the expected value of the 
loss, when this can be calculated. This implies that if we have two risk functions, p(p, d,), p(p, d,), 
the first of which is, for a given value of p, less than the second, then, for this value of p, the first 
decision function is ‘better’ than the second, in the sense that it is reasonable to act on the first 
decision function rather than the second. If the first decision function is sometimes better, and is 
never worse, than the second, Wald says the first is ‘uniformly better’ than the second. Taking our 
decision functions (i)—-(iv), we see that (i) is best of all when p=0 or p=1, since then its risk function 
is zero; while (ii) is best of all when p= 4; on the other hand, when p=}, (i) is worst of all, while when 
p=0 or 1, (ii) is worst of all. If we from now on take A= ./n/(1+ Jn), the risk function (iii) becomes 
}(1+ Jn)*, which is independent of p and uniformly less than (iv). Thus (iii) is uniformly better than 
(iv), but it is not uniformly better than (i) or (ii); nor is either (i) or (ii) uniformly better than (iii). 
Thus, if we confine attention to these four decision functions, the principle of minimizing the 
mean loss enables us to rule out (iv) from consideration, but leaves us with (i), (ii) and (iii) to choose 
from. These three are what Wald calls ‘admissible’ decision functions, because for none of them does 
there exist another decision function which is uniformly better. (iv) is not admissible. If Mr A uses 
an admissible decision function to guide his conduct, no one who accepts the principle of minimizing 
mean loss can say that Mr A is acting unreasonably, since for some possible situations the rule Mr A 
is using is at least as good as any other. 

We can go further if we are prepared to rule out cases where it is impossible to minimize the mean 
loss—such as would arise in the example we have taken if we allowed n to vary and assumed that 
further observations cost nothing. In this latter case, for any rule leading to a decision, we could 
clearly get a better one by taking more observations. But if we rule out such cases, then accepting 
the principle of minimizing mean loss implies that we need consider as reasonable only admissible 
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decision functions. Thus the problem of determining the class of all admissible decision functions is 
seen to be of central importance. 

Unfortunately, Wald was unable to formulate simple and general conditions which guarantee 
that exceptional cases having the same effect as the ‘free observations’ of the previous paragraph 
are ruled out. He therefore introduces, instead of the class of all admissible decision functions, 
the notion of a ‘complete class’ of decision functions. A class of decision functions is said to be 
complete if for any decision function d not in the class there is a decision function d’ in the class 
which is uniformly better than d. If we confine our attention to decision functions belonging to 
a complete class we shall not lose anything thereby, since those we are ignoring are worse than some 
we are considering. The set of all possible decision functions is itself a complete class; so that the idea 
of a complete class helps us to choose our decision function only in so far as we make our complete class 
as small as possible. Wald «btains here what is probably the best possible result, viz. that under very 
general conditions, the set of all Bayes solutions forms a complete class of decision functions. 

By a Bayes solution of a decision problem is meant a solution arrived at by giving the unknown 
parameter (p in our case) a prior distribution, and then using Bayes’s theorem to minimize the mean 
loss. For example, our decision function (iii) can be arrived at by giving p the prior probability 
function KpY-1gY-1, with y= ./n/2. When r black balls out of n have been observed, the posterior 
distribution of p is then given by the probability function 


1 
Kereprqr-||" Kprerpr-tqr-dp, 


and to minimize the mean squared error we take the mean of this distribution, (r+ y)/(n+ 2y), which 
with the given value of y is the same as (iii). The decision function (ii) is a Bayes solution, since it can 
be obtained by taking it as a priori certain that p=}. On the other hand, the decision function 
(iv) cannot be obtained as a Bayes solution since we shall always take the mean of the posterior 
distribution, and this will always be a definite number, not a random variable. Finally, (i) can be 
arrived at as a limit of Bayes solutions corresponding to the prior distributions given by K/pgq for 
a<p<1-—da, asa tends to 0. A solution arrived at in this way, as a limit of Bayes solutions, is called 
by Wald a Bayes solution in the wide sense. His theorem is, that under very general conditions the 
class of all Bayes solutions in the wide sense is a complete class. 

The simple example we have taken does not indicate in the least the extreme degree of generality 
which attaches to this result of Wald’s. Some idea of this generality can be obtained from the con- 
sideration that about half the present book is devoted to the statement of the conditions under which 
the theorem is true, and to its proof. In our example, the successive observations are statistically 
independent; this restriction is removed. Again, in our case, the number of observations was fixed in 
advance; this restriction is removed, and in its place very general assumptions are made to the effect 
that observations cost money. The possibility is allowed that observations of several different kinds 
can be made. In our case the problem reduced to the estimation of a single parameter ; this restriction 
is removed, so that we need not even be estimating any finite number of parameters, and, if we are, 
we need not make point estimates of them. Finally, included in the decision function is not only the 
final decision to be arrived at, but also which observations are to be made at each stage of the experi- 
ment, and when experimentation is to be stopped, etc. It is, in fact, difficult to see in what direction 
any serious possibility of generalization remains. And it is, of course, precisely in its degree of generality 
that the value of the theorem lies. Its meaning can be summed up by saying that, if we are prepared 
to accept the principle of minimizing mean loss, when this is known, then we should use Bayes’s 
theorem to guide us, based on some a priori distribution. 

Thus we would conjecture that a major result of this work of Wald’s will be to reinstate Bayes’s 
theorem to the central place it always should have occupied in the theory of decision functions. It 
lost this place, probably, because of Fisher’s attacks on its use in statistical inference. Fisher was 
quite right in asserting that Bayes’s theorem had no application to the problems of statistical inference 
with which Fisher, and the majority of statisticians after him, have been concerned. But this was taken 
on many sides to mean that the theorem had no application to the problems, such as those of sampling 
inspection, which could rightly be regarded primarily as decision problems rather than inference 
problems. 

Another theme which occurs in the present work is that of the ‘minimax’ decision function. One of 
the difficulties that has to be faced by anyone who tries to identify the decision problem and the 
inference problem is, that we have a feeling that the inference to be drawn from a given set of data in 
relation to & given set of statistical hypotheses should be unique (though it may be uncertain). And 
the trouble with Bayes solutions is that there are as many of them as there are prior distributions, in 
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general (apart from cases, for example, where a uniformly most powerful test exists). So some principle 
has to be adopted which will pick out from the multiplicity of Bayes solutions one which can in some 
sense or other be regarded as the ‘true’ solution. Bayes and Laplace, of course, did this by their famous 
‘equal distribution of ignorance’ postulate. Now Wald suggests, instead, the principle of ‘minimizing 
the maximum loss’. This means choosing that decision function for which the maximum of the risk 
function is least. In the example we have taken, this means using the decision rule (iii). For we have 
already seen that this rule is a Bayes rule, and hence admissible. This means that no other rule can 
have a risk function which is never greater than the risk function for this rule. But since the risk 
function for rule (iii) is a constant, this means that any other risk function must have its maximum 
value greater than or equal to this constant. In other words, rule (iii) minimizes the maximum loss. 

Minimax rules have useful properties, perhaps the most useful being, as in our example, that the risk 
function for such rules is usually constant. But the principle of adopting the minimax rule cannot be 
regarded as any less arbitrary than Bayes’s and Laplace’s principle of the equal distribution of the 
prior probability. If, in a long series of repetitions of the problem we have considered, in fact p has 
& distribution, concentrated near the ends of its range, then decision rule (i) will in fact be the best of 
those considered ; and if in fact there is a concentration towards 4, then perhaps (iii), or even (ii), will 
be better than (i). The prior distribution really is a feature of the problem, and without knowing it we 
cannot obtain a unique solution, which has any absolute title to being called the ‘best’. 

From the point of view of statisticians mainly concerned with problems of inductive inference, 
perhaps the most useful aspect of Wald’s main result is, that in so far as every admissible rule is 
a Bayes rule, since the data enter into the posterior probability only via the likelihood function, this 
represents another way of seeing that the likelihood function ‘contains all the information in the data’, 
and hence that the problem of inference reduces to the problem of describing the likelihood function. 


G. A. BARNARD 


PUBLICATIONS OF U.S. DEPARTMENT OF COMMERCE, 
NATIONAL BUREAU OF STANDARDS 


(i) Table of arctangents of rational numbers. Applied Mathematics Series 11. 1951. 
Pp. 105. Price $1.50. 


This table gives arctangents for rational arguments m/n, i.e. it provides tan-!(m/n) to 12 decimals for 
all pairs of integers m, n with 0<m<n<100. Also shown are cot~!(m/n) = 47—tan-!(m/n) (to 12 
decimals) and the sum of squares m? + n?. 

The main applications of the table are: 

(a) The conversion of cartesian co-ordinates n,m to polars ¢ = tan-!(m/n) and r= ,/(m?+n?) for 
which an auxiliary table of square roots would be required. 

(6) The evaluation of natural logarithms for complex arguments from 


log, (n+%m) = 4log, (m* +n?) +7 tan- (m/n) 
for which an auxiliary table of natural logarithms would be required. 
(c) The evaluation of the I’-function for complex arguments from the recurrence 
log, [(a+iy+1) = flog, (x? + y?) +7 tan—! (y/x) + log, I(x + ty). 
The latter property may make the table useful in statistical work in connection with moment generating 
functions of I’-variables or Laguerre series. 
Auxiliary tables provide formulae for the ‘reduction’ of arctangents in the forms 
tan-!(m/n) = Xf;tan-'r; (Table 1, col. 3), 
tan-1m = Xf;tan-'m, (Table 2), 


by providing the integers f,, 7; and m;<m. The table is provided with a short Introduction prepared by 
J. Todd and giving theoretical foundations and details of interpolation. 


(ii) Tables of the exponential function e*. Applied Mathematics Series 14 (supersedes 
MT 2). 1951. Pp. 537. Price $3.25. 


This table constitutes a revised edition of the second volume of the series of mathematical tables 
published by the Work Projects Administration of the City of New York and was originally published 
in 1939. It represents the most extensive table of the exponential function available if one takes into 
consideration both the decimal accuracy in e* and the fineness of the tabular interval in zx. 
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It provides the following functions: 
e* to 18 decimals for 2=0-0000 (0-0001) 1-0000 


to 15 decimals for 1:0000 (0-0001) 2-5000 
to 15 decimals for 2-500 (0-001) 5-000 
to 15 decimals for 5-00 (0-01) 10-00 
e-* to 18 decimals for 0:0000 (00001) 2-5000 
e* and e~* to 18 decimals for 0 (10-*) 10-4 
to 19 significant figures for 1 (1) 100 


and auxiliary tables. 

It should be noted that e* and e-* are given in separate tables and, if both required for the same 
argument, two separate pages have to be consulted. A common argument for x < 2-5 would have been 
preferable. 


(iii) A guide to tables of the normal probability integral. Applied Mathematics 
Series 21. 1952. Pp. 16. Price 15c. 


This issue of the series provides, in Part I, a brief index of tables of the normal probability integral. In 
addition to the reference, information is provided on the type of area tabulated (i.e. whether single 
tail, two tail, etc.) the decimal accuracy, the tabular intervals and auxiliary quantities (e.g. differences) 
provided for interpolation. Lay-out and arrangement of this index are therefore similar to those adopted 
in An Index of Mathematical Tables by Fletcher, Miller & Rosenhead (Scientific Computing Service Ltd., 
London and McGraw-Hill Book Co., New York, 1946), although the present list of specialized tables is 
more extensive. 

Part II gives the corresponding index for tables of the inverse function (normal deviates). 

Part III gives principles of interpolation applicable to these tables in general and Part IV a summary 
of formulae. 


(iv) Tables of the Bessel functions Y,(x), Y,(x), K,(x), K,(z). Applied Mathematics 
Series 25 (supersedes A.M.S.I.). 1952. Pp. 60. Price 40c. 


This table represents a subtabulation of the Bessel functions Yo, Y,; Ky, K, given at a rather wider 
interval in vol. v1 of the mathematical tables series issued by the British Association for the Advancement 
of Science; certain recalculations were made for small x. The functions tabled are 


Y,(x) and Y,(z) for za = 0-0001 (0-0001) 9-050 (0-001) 1-000 
K,(x) and K,(z) for 2 = 0-0001(0-0001) (0-033) (0-001) 1-000 


to the same (varying) decimal accuracy as in the British Association tables with an additional inter- 
polation tolerance of 1-2 units. For small x auxiliary functions are tabled which enable the Bessel 
functions to be evaluated with the help of log tables. An Introduction by A. N. Lowan gives basic 
formulae and methods of interpolation mainly based on the tabled differences. 

H. O. HARTLEY. 


Additional Note by Editor. Since these reviews were written an additional table in this series has been 
received, which is a re-issue of Table 16 of the N.B.S. Mathematical Tables Project. 


(v) Table of arctan *. Applied Mathematics Series 26. 1952. Pp. 188. Price $1.75. 


, z= du 
The integral Arctanz =| —— 
ol+u? 


is tabled to 12 decimal places for the following ranges and argument intervals: 
x = 0-000 (0-001) 7-000 (0-01) 50-00 (0-1) 300-0 (1) 2000 (10) 10,000. 
Second differences are given throughout. 
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